Why your AI fails in Production [Guest]

Devansh

Jul 23

Beyond Benchmarks: Why your perfect lab model fails when real users, messy data, and infrastructure constraints show up

Read →

7 Comments

Joel Salinas

Jul 23

Excellent breakdown!

Expand full comment

Reply (1)

Devansh

Jul 23

Excited for yours next week as well

Expand full comment

Reply (1)

Joel Salinas

Jul 23

I appreciate it, I am too!

Expand full comment

Louis-François Bouchard

Jul 23

I'm really happy we made this post happen! Love the final results and hope everyone will enjoy the read!

Expand full comment

Reply (1)

Devansh

Jul 23

Thakn you for putting up with my process haha

Expand full comment

Mia Kiraki

Jul 24

This is the most important conversation in AI right now!! The gap between a Jupyter Notebook and a production endpoint is where 99% of AI value dies.

Expand full comment

John Michael Thomas

Jul 23

Thanks for the breakdown!

One thing I've been highlighting in my own work recently is the difference between the model and the chatbot that runs it. A chatbot is an app, and the way each chatbot is written dramatically impacts how users experience the model.

For example, no matter how amazing the model is, and how bulletproof the infra is, when the context window is full, ChatGPT will still forget things and Claude will still stop talking to you.

The vast majority of users will probably never interact with the model via API, and most users interact with all different models through the same 1 or 2 chatbots. So, the chatbot's features and failures can sometimes have a greater impact on user experience than the models themselves.

I'm curious what thoughts you have about how the chatbots interact with the models. How does that chat app between the user and the models impact how well each model's training, testing, and benchmarking transfer from the lab to the real world?

Expand full comment

Artificial Intelligence Made Simple

Why your AI fails in Production [Guest]