Discussion about this post

User's avatar
Geert's avatar

This is a great article, thanks for the honest eval. I've noticed a significant difference in the orchestration layer at you.com who seem to have build their business model on trying to nail the orchestration. It's not perfect, but will help the masses to use the agentic approach to work in chat-based interactions (solving most of what o1 claimed to do).

Would love to hear your eval of the agents/orchestration at you.com if possible.

Expand full comment
Sirsh's avatar

I agree. I came at this from a slightly different direction recently here - https://medium.com/@mrsirsh/7-days-of-agent-framework-anatomy-from-first-principles-day-1-d54d5fb6d0a3. I was playing with building a simple agent framework from scratch to test a few ideas and as part of this i added some basic wrappers and compared those models but on a more specific task via the apis. My assessment was the same in terms of ranking. 4o is fairly solid on things relating to planning and tool use and Claude performs well. Gemini is awful, confabulating among other let-downs. Mini is reliable for well structured cases.

Expand full comment
5 more comments...

No posts