7 Comments
User's avatar
Joel Salinas's avatar

Excellent breakdown!

Expand full comment
Devansh's avatar

Excited for yours next week as well

Expand full comment
Joel Salinas's avatar

I appreciate it, I am too!

Expand full comment
Louis-François Bouchard's avatar

I'm really happy we made this post happen! Love the final results and hope everyone will enjoy the read!

Expand full comment
Devansh's avatar

Thakn you for putting up with my process haha

Expand full comment
Mia Kiraki's avatar

This is the most important conversation in AI right now!! The gap between a Jupyter Notebook and a production endpoint is where 99% of AI value dies.

Expand full comment
John Michael Thomas's avatar

Thanks for the breakdown!

One thing I've been highlighting in my own work recently is the difference between the model and the chatbot that runs it. A chatbot is an app, and the way each chatbot is written dramatically impacts how users experience the model.

For example, no matter how amazing the model is, and how bulletproof the infra is, when the context window is full, ChatGPT will still forget things and Claude will still stop talking to you.

The vast majority of users will probably never interact with the model via API, and most users interact with all different models through the same 1 or 2 chatbots. So, the chatbot's features and failures can sometimes have a greater impact on user experience than the models themselves.

I'm curious what thoughts you have about how the chatbots interact with the models. How does that chat app between the user and the models impact how well each model's training, testing, and benchmarking transfer from the lab to the real world?

Expand full comment