I added Gemini 2.5 and it does perform better. However, you should be cautious when interpreting the current results, as I have only evaluated them on 23 rather basic tasks. I will be adding more in the upcoming days.
I added Gemini 2.5 and it does perform better. However, you should be cautious when interpreting the current results, as I have only evaluated them on 23 rather basic tasks. I will be adding more in the upcoming days.