Logan Zollener made a claim referencing a Twitter post which claims that LLMs are mostly simple statistics/prediction machines:
And there’s other claims that AI labs progress on AI is slowing down because they can’t just scale up data anymore.
I’m not sure whether this is true, but I’ll give a lot of Bayes points to @Alexander Gietelink Oldenziel for predicting a slow down in purely scaling LLMs by data alone, and agree slightly more with the view that scaling alone is probably not enough.
As a small piece of feedback, I found it a bit frustrating that so much of your comment was links posted without clarification of what each one was, some of which were just quote-tweets of the others.
Logan Zollener made a claim referencing a Twitter post which claims that LLMs are mostly simple statistics/prediction machines:
Or rather that a particular experiment training simple video models on synthetic data showed that they generalized in ways different from what the researchers viewed as correct (eg given data which didn’t specify, they generalized to thinking that an object was more likely to change shape than velocity).
I certainly agree that there are significant limitations to models’ ability to generalize out of distribution. But I think we need to be cautious about what we take that to mean about the practical limitations of frontier LLMs.
For simple models trained on simple, regular data, it’s easy to specify what is and isn’t in-distribution. For very complex models trained on a substantial fraction of human knowledge, it seems much less clear to me (I have yet to see a good approach to this problem; if anyone’s aware of good research in this area I would love to know about it).
There are many cases where humans are similarly bad at OOD generalization. The example above seems roughly isomorphic to the following problem: I give you half of the rules of an arbitrary game, and then ask you about the rest of the rules (eg: ‘What happens if piece x and piece y come into contact?’. You can make some guesses based on the half of the rules you’ve seen, but there are likely to be plenty of cases where the correct generalization isn’t clear. The case in the experiment seems less arbitrary to us, because we have extensive intuition built up about how the physical world operates (eg that objects fairly often change velocity but don’t often change shape), but the model hasn’t been shown info that would provide that intuition[1]; why, then, should we expect it to generalize in a way that matches the physical world?
We have a history of LLMs generalizing correctly in surprising ways that weren’t predicted in advance (looking back at the GPT-3 paper is a useful reminder of how unexpected some of the emerging capabilities were at the time).
I am flattered to receive these Bayes points =) ; I would be crying tears of joy if there was a genuine slowdown but
I generally think there are still huge gains to be made with scaling. Sometimes when people hear my criticism of scaling maximalism they patternmatch that to me saying scaling wont be as big as they think it is. To the contrary, I am saying scaling further will be as big as you think it will be, and additionally there is an enormous advance yet to come.
How much evidence do we have of a genuine slowdown? Strawberry was about as big an advance as gpt3 tp gpt4 in my book. How credible are these twitter rumors?
Yeah, I don’t trust the Twitter rumors to work out very much, and at any rate, we shall see soon in 2025-2026 what exactly is going on with AI progress if and when they released GPT-5/Orion.
Logan Zollener made a claim referencing a Twitter post which claims that LLMs are mostly simple statistics/prediction machines:
And there’s other claims that AI labs progress on AI is slowing down because they can’t just scale up data anymore.
I’m not sure whether this is true, but I’ll give a lot of Bayes points to @Alexander Gietelink Oldenziel for predicting a slow down in purely scaling LLMs by data alone, and agree slightly more with the view that scaling alone is probably not enough.
https://www.lesswrong.com/posts/9ffiQHYgm7SKpSeuq/could-we-use-current-ai-methods-to-understand-dolphins#BxNbyvsLpLrAvXwjT
https://x.com/andrewb10687674/status/1853716840880509402
https://x.com/bingyikang/status/1853635009611219019
https://x.com/Yampeleg/status/1855371824550285331
https://x.com/amir/status/1855367075491107039
As a small piece of feedback, I found it a bit frustrating that so much of your comment was links posted without clarification of what each one was, some of which were just quote-tweets of the others.
Or rather that a particular experiment training simple video models on synthetic data showed that they generalized in ways different from what the researchers viewed as correct (eg given data which didn’t specify, they generalized to thinking that an object was more likely to change shape than velocity).
I certainly agree that there are significant limitations to models’ ability to generalize out of distribution. But I think we need to be cautious about what we take that to mean about the practical limitations of frontier LLMs.
For simple models trained on simple, regular data, it’s easy to specify what is and isn’t in-distribution. For very complex models trained on a substantial fraction of human knowledge, it seems much less clear to me (I have yet to see a good approach to this problem; if anyone’s aware of good research in this area I would love to know about it).
There are many cases where humans are similarly bad at OOD generalization. The example above seems roughly isomorphic to the following problem: I give you half of the rules of an arbitrary game, and then ask you about the rest of the rules (eg: ‘What happens if piece x and piece y come into contact?’. You can make some guesses based on the half of the rules you’ve seen, but there are likely to be plenty of cases where the correct generalization isn’t clear. The case in the experiment seems less arbitrary to us, because we have extensive intuition built up about how the physical world operates (eg that objects fairly often change velocity but don’t often change shape), but the model hasn’t been shown info that would provide that intuition[1]; why, then, should we expect it to generalize in a way that matches the physical world?
We have a history of LLMs generalizing correctly in surprising ways that weren’t predicted in advance (looking back at the GPT-3 paper is a useful reminder of how unexpected some of the emerging capabilities were at the time).
Note that I haven’t read the paper; I’m inferring this from the video summary they posted.
I admit that I was a bit of a link poster here, and fair points on the generalization ability of LLMs.
I am flattered to receive these Bayes points =) ; I would be crying tears of joy if there was a genuine slowdown but
I generally think there are still huge gains to be made with scaling. Sometimes when people hear my criticism of scaling maximalism they patternmatch that to me saying scaling wont be as big as they think it is. To the contrary, I am saying scaling further will be as big as you think it will be, and additionally there is an enormous advance yet to come.
How much evidence do we have of a genuine slowdown? Strawberry was about as big an advance as gpt3 tp gpt4 in my book. How credible are these twitter rumors?
Yeah, I don’t trust the Twitter rumors to work out very much, and at any rate, we shall see soon in 2025-2026 what exactly is going on with AI progress if and when they released GPT-5/Orion.