One thing I've been highlighting in my own work recently is the difference between the model and the chatbot that runs it. A chatbot is an app, and the way each chatbot is written dramatically impacts how users experience the model.
For example, no matter how amazing the model is, and how bulletproof the infra is, when the context window is full, ChatGPT will still forget things and Claude will still stop talking to you.
The vast majority of users will probably never interact with the model via API, and most users interact with all different models through the same 1 or 2 chatbots. So, the chatbot's features and failures can sometimes have a greater impact on user experience than the models themselves.
I'm curious what thoughts you have about how the chatbots interact with the models. How does that chat app between the user and the models impact how well each model's training, testing, and benchmarking transfer from the lab to the real world?
Excellent breakdown!
Excited for yours next week as well
I appreciate it, I am too!
I'm really happy we made this post happen! Love the final results and hope everyone will enjoy the read!
Thakn you for putting up with my process haha
This is the most important conversation in AI right now!! The gap between a Jupyter Notebook and a production endpoint is where 99% of AI value dies.
Thanks for the breakdown!
One thing I've been highlighting in my own work recently is the difference between the model and the chatbot that runs it. A chatbot is an app, and the way each chatbot is written dramatically impacts how users experience the model.
For example, no matter how amazing the model is, and how bulletproof the infra is, when the context window is full, ChatGPT will still forget things and Claude will still stop talking to you.
The vast majority of users will probably never interact with the model via API, and most users interact with all different models through the same 1 or 2 chatbots. So, the chatbot's features and failures can sometimes have a greater impact on user experience than the models themselves.
I'm curious what thoughts you have about how the chatbots interact with the models. How does that chat app between the user and the models impact how well each model's training, testing, and benchmarking transfer from the lab to the real world?