Which Foundation Model is best for Agent…

Nov 19, 2024

Gemini vs GPT 4o vs Claude Sonnet vs o1 vs 4o mini

7 Comments

Nov 20, 2024

This is a great article, thanks for the honest eval. I've noticed a significant difference in the orchestration layer at you.com who seem to have build their business model on trying to nail the orchestration. It's not perfect, but will help the masses to use the agentic approach to work in chat-based interactions (solving most of what o1 claimed to do).

Would love to hear your eval of the agents/orchestration at you.com if possible.

Expand full comment

Reply (1)

Devansh

Nov 21, 2024

I'll have to take a look there. What I say, it's a bit more rudimentary but they should have evolved by now.

Expand full comment

Sirsh

Nov 20, 2024Edited

I agree. I came at this from a slightly different direction recently here - https://medium.com/@mrsirsh/7-days-of-agent-framework-anatomy-from-first-principles-day-1-d54d5fb6d0a3. I was playing with building a simple agent framework from scratch to test a few ideas and as part of this i added some basic wrappers and compared those models but on a more specific task via the apis. My assessment was the same in terms of ranking. 4o is fairly solid on things relating to planning and tool use and Claude performs well. Gemini is awful, confabulating among other let-downs. Mini is reliable for well structured cases.

Expand full comment

Reply (1)