6 Comments

This is a great article, thanks for the honest eval. I've noticed a significant difference in the orchestration layer at you.com who seem to have build their business model on trying to nail the orchestration. It's not perfect, but will help the masses to use the agentic approach to work in chat-based interactions (solving most of what o1 claimed to do).

Would love to hear your eval of the agents/orchestration at you.com if possible.

Expand full comment

I'll have to take a look there. What I say, it's a bit more rudimentary but they should have evolved by now.

Expand full comment

I agree. I came at this from a slightly different direction recently here - https://medium.com/@mrsirsh/7-days-of-agent-framework-anatomy-from-first-principles-day-1-d54d5fb6d0a3. I was playing with building a simple agent framework from scratch to test a few ideas and as part of this i added some basic wrappers and compared those models but on a more specific task via the apis. My assessment was the same in terms of ranking. 4o is fairly solid on things relating to planning and tool use and Claude performs well. Gemini is awful, confabulating among other let-downs. Mini is reliable for well structured cases.

Expand full comment

That's an excellent article (shared it in my recs). If you're interested, you should come aggregate your learning and do a guest post

Expand full comment

Thank you Devansh for taking the time to check it out! And yes, I would be interested in doing a guest post on this sometime, that would be awesome.

Expand full comment

I'm excited to see it

Expand full comment