regrettably, this post not just a brilliant and empirically justified analysis/discussion of technical issues associated with (perceived) diminishing returns on 'scaling' - narrowly defined - but a contextually insightful/inciteful argument reminding people how much - and why - economics, culture and institutional inertia matter for understanding/explaining 'innovation trajectories'....i.e, intel had plenty of opportunities to out nvidia nvidia.....'the scaling's falling!!!the scaling's falling!!!' chicken little LLM flock would do well to read and read devansh's posts before broiling themselves....
Nice post, Devansh. Currently writing about the same exact topic. Do you think there is any merit to the claim that AI scaling is moving from training to inference? Reference: https://www.exponentialview.co/p/ev-500
I was thinking in the context that employees at major AI labs told the The Information and Bloomberg that the performance of upcoming AI models such as Orion, etc., are not living up to expectations. Many people claim this is because scaling has reaced its limits. But maybe this is not a failure but just a sign that scaling is moving from "training" to "inference" as Azeem Azhar writes:
"AI’s most powerful players are shifting away from the brute-force scaling that defined the last decade. Instead, they’re pursuing a different approach that could continue to deliver (..) Instead of funnelling all computational resources into training larger models, companies are now dedicating more compute power to the inference phase."
So many excellent points in here that I wish were shared with (and understood by) top executives buying the sales pitches of some of these companies. The same companies and executives who have under-invested in data infrastructure and governance, who are now just trying to slap AI on top of everything.
It would be odd if there were two methods A and B for improving some process, and its owner invested in both A and B to a great degree, and then someone wrote a piece about how the owner had a bias towards A since they did not invest in B alone. That would be an odd piece, yet that is my take on this article.
Sure we know scaling improves performance and is expensive, so they did it. but we also know research on new algorithms also can have great improvements too, though its outcome is more random. Now all of these places still have thousands of researchers doing all kinds of research (this is B in my example above). So I think they are doing both kinds of advancement. Why do you think otherwise?
Perhaps they have not yet had luck with alternatives. Though Altman says he thinks ASI is quite plausible by end of next year. My guess is open AI has made advancements on type-II reasoning, but does not want let any cat out of the bag in order to gain precious months on their competitors.
Its just that these guys have LOTS and LOTS of money, so why not spend it on doing better on A as well?
I'm not entirely sure what this comment is trying to say. Can you explain?
I'm guessing you're saying that I'm saying that I'm arguing that companies do only A (scaling) and not B. Given how many articles I have talking about not scaling that would just be untrue. The point of this article is to talk about why scaling has been such a dominant force (which it objectively has, if you look at the LLM discussions). That does mean that I'm saying there's no other work done (again I have covered lots of research to the same).
I think my comment was more snarky than I meant. But your by-line:
" look into why Scaling became the default method for performance improvement "
I feel the default method is to do research to investigate other paths.
Maybe you mean to say: dollars spent on improvement thru scaling are now out weighing dollars spent on research.
It does seem that could be true (though they are spending lot on researcher salaries too.)
My sense is that these guys just have too much money right now, and are striving in a winner take much game, but i see little evidence that they are slowing their roll on research.
Anyway, it just does not seem their default is scaling to me. but dollar for dollar it probably is their largest expense right now....
"I feel the default method is to do research to investigate other paths."- maybe for building solutions, but that doesn't seem to be the case for LLM Development. The Llama paper said their main focus was adding more high-quality data.
o1 also seemed to be focused on scaling. They did nothing to address many of the core issues with LLMs/make changes to the system. This can be seen in their medical diagnosis capbiltieis -which despite the hype- lack some very basic components required to do diagnosis (attribute ranking, performance, stability, and ability to consider Bayesian priors). If they had approached the development from a focus of addressing core issues.
Well I am going off a single comment from Sam Altman in his interview with Y-combinator where is say ASI is plausible in 2025. He is careful to say that scaling helps performance and so he does it, but he never said (that I have heard) the thought that ASI would come purely from scaling. (and I don't think it will.)
Thus I am betting that Open AI has a type-II reasoning system that is built upon LLMs but has
(1) explicit ability to learn and incorporate new info while working on an extended problem, and
(2) maintains an evolving representation and goal set for extended problems.
I don't think you get a credible ASI without these things, and it **SEEMS** Sam knows this (even as he extols the benefits of scaling)... but maybe I am giving him too much credit?
In any case, this is why I believe that they are feverishly working on next gen algorithms even as they show us incremenal progress on existing algs for now. (I don't think they will let the cat out of the bag on new algs until they are a fiat-a-compli for a new high revenue product. why would they? It just needlessly gives the other guys a sneak peek.
but my thinking really does rest on the assumption that Sam is not an idiot, and he does have actual evidence from his team behind is claim of ASI in 2025.
p.s. but I do expect they are NOT doing the "obvious" stats modeling that would improve the medical results. it seems they are not focused on domain specific advancements, but rather that focused on the holy grail advancement of type-II reasoning.
Type 2 reasoning would involve the ability to factor in various steps and "specialized thought processes". From what I've seen what they've done so far is closer to "think step by step" which is not the same as principled Sys-2 thinking (which done right is more disciplined and thus more stable). The lack of Stability is a big killer.
Maybe you can use your OpenAI contacts to verify / refute this idea that they have made significant progress on new type-II reasoning capacity that is not part of o1???
regrettably, this post not just a brilliant and empirically justified analysis/discussion of technical issues associated with (perceived) diminishing returns on 'scaling' - narrowly defined - but a contextually insightful/inciteful argument reminding people how much - and why - economics, culture and institutional inertia matter for understanding/explaining 'innovation trajectories'....i.e, intel had plenty of opportunities to out nvidia nvidia.....'the scaling's falling!!!the scaling's falling!!!' chicken little LLM flock would do well to read and read devansh's posts before broiling themselves....
Thanks such a nice comment. Thank you
Nice post, Devansh. Currently writing about the same exact topic. Do you think there is any merit to the claim that AI scaling is moving from training to inference? Reference: https://www.exponentialview.co/p/ev-500
What about inference specially?
I was thinking in the context that employees at major AI labs told the The Information and Bloomberg that the performance of upcoming AI models such as Orion, etc., are not living up to expectations. Many people claim this is because scaling has reaced its limits. But maybe this is not a failure but just a sign that scaling is moving from "training" to "inference" as Azeem Azhar writes:
"AI’s most powerful players are shifting away from the brute-force scaling that defined the last decade. Instead, they’re pursuing a different approach that could continue to deliver (..) Instead of funnelling all computational resources into training larger models, companies are now dedicating more compute power to the inference phase."
Such a good article
I'm framing this on my wall
So many excellent points in here that I wish were shared with (and understood by) top executives buying the sales pitches of some of these companies. The same companies and executives who have under-invested in data infrastructure and governance, who are now just trying to slap AI on top of everything.
Sharing these articles is a great way to ensure people find these articles!
It would be odd if there were two methods A and B for improving some process, and its owner invested in both A and B to a great degree, and then someone wrote a piece about how the owner had a bias towards A since they did not invest in B alone. That would be an odd piece, yet that is my take on this article.
Sure we know scaling improves performance and is expensive, so they did it. but we also know research on new algorithms also can have great improvements too, though its outcome is more random. Now all of these places still have thousands of researchers doing all kinds of research (this is B in my example above). So I think they are doing both kinds of advancement. Why do you think otherwise?
Perhaps they have not yet had luck with alternatives. Though Altman says he thinks ASI is quite plausible by end of next year. My guess is open AI has made advancements on type-II reasoning, but does not want let any cat out of the bag in order to gain precious months on their competitors.
Its just that these guys have LOTS and LOTS of money, so why not spend it on doing better on A as well?
I'm not entirely sure what this comment is trying to say. Can you explain?
I'm guessing you're saying that I'm saying that I'm arguing that companies do only A (scaling) and not B. Given how many articles I have talking about not scaling that would just be untrue. The point of this article is to talk about why scaling has been such a dominant force (which it objectively has, if you look at the LLM discussions). That does mean that I'm saying there's no other work done (again I have covered lots of research to the same).
I think my comment was more snarky than I meant. But your by-line:
" look into why Scaling became the default method for performance improvement "
I feel the default method is to do research to investigate other paths.
Maybe you mean to say: dollars spent on improvement thru scaling are now out weighing dollars spent on research.
It does seem that could be true (though they are spending lot on researcher salaries too.)
My sense is that these guys just have too much money right now, and are striving in a winner take much game, but i see little evidence that they are slowing their roll on research.
Anyway, it just does not seem their default is scaling to me. but dollar for dollar it probably is their largest expense right now....
"I feel the default method is to do research to investigate other paths."- maybe for building solutions, but that doesn't seem to be the case for LLM Development. The Llama paper said their main focus was adding more high-quality data.
o1 also seemed to be focused on scaling. They did nothing to address many of the core issues with LLMs/make changes to the system. This can be seen in their medical diagnosis capbiltieis -which despite the hype- lack some very basic components required to do diagnosis (attribute ranking, performance, stability, and ability to consider Bayesian priors). If they had approached the development from a focus of addressing core issues.
https://artificialintelligencemadesimple.substack.com/p/a-follow-up-on-o-1s-medical-capabilities
Got it.
Well I am going off a single comment from Sam Altman in his interview with Y-combinator where is say ASI is plausible in 2025. He is careful to say that scaling helps performance and so he does it, but he never said (that I have heard) the thought that ASI would come purely from scaling. (and I don't think it will.)
Thus I am betting that Open AI has a type-II reasoning system that is built upon LLMs but has
(1) explicit ability to learn and incorporate new info while working on an extended problem, and
(2) maintains an evolving representation and goal set for extended problems.
I don't think you get a credible ASI without these things, and it **SEEMS** Sam knows this (even as he extols the benefits of scaling)... but maybe I am giving him too much credit?
In any case, this is why I believe that they are feverishly working on next gen algorithms even as they show us incremenal progress on existing algs for now. (I don't think they will let the cat out of the bag on new algs until they are a fiat-a-compli for a new high revenue product. why would they? It just needlessly gives the other guys a sneak peek.
but my thinking really does rest on the assumption that Sam is not an idiot, and he does have actual evidence from his team behind is claim of ASI in 2025.
p.s. but I do expect they are NOT doing the "obvious" stats modeling that would improve the medical results. it seems they are not focused on domain specific advancements, but rather that focused on the holy grail advancement of type-II reasoning.
Type 2 reasoning would involve the ability to factor in various steps and "specialized thought processes". From what I've seen what they've done so far is closer to "think step by step" which is not the same as principled Sys-2 thinking (which done right is more disciplined and thus more stable). The lack of Stability is a big killer.
For a good example of Sys 2 thinkinbg (or atleast something closer) I think AlphaGeometry form DeepMind did it well- https://artificialintelligencemadesimple.substack.com/p/deepminds-alphageometry-is-the-mona
Maybe you can use your OpenAI contacts to verify / refute this idea that they have made significant progress on new type-II reasoning capacity that is not part of o1???
Feel free to correct if I'm wrong