“Scaling breaks down”, they say. By which they mean one of the following wildly different claims with wildly different implications:
When you train on a normal dataset, with more compute/data/parameters, subtract the irreducible entropy from the loss, and then plot in a log-log plot: you don’t see a straight line anymore.
Same setting as before, but you see a straight line; it’s just that downstream performance doesn’t improve .
Same setting as before, and downstream performance improves, but: it improves so slowly that the economics is not in favor of further scaling this type of setup instead of doing something else.
A combination of one of the last three items and “btw., we used synthetic data and/or other more high-quality data, still didn’t help”.
Nothing in the realm of “pretrained models” and “reasoning models like o1″ and “agentic models like Claude with computer use” profits from a scale-up in a reasonable sense.
Also, even though it’s not locally rhetorically convenient [ where making an isolated demand for rigor of people making claims like “scaling has hit a wall [therefore AI risk is far]” that are inconvenient for AInotkilleveryoneism, is locally rhetorically convenient for us ], we should demand the same specificity of people who are claiming that “scaling works”, so we end up with a correct world-model and so people who just want to build AGI see that we are fair.
On the question of how much evidence the following scenarios are against the AI scaling thesis (which I roughly take to mean that more FLOPs and compute/data reliably makes AI better for economically important relevant jobs), I’d say that scenarios 4-6 falsify the hypothesis, while 3 is the strongest evidence against the hypothesis, followed by 2 and 1.
4 would make me more willing to buy algorithmic progress as important, 5 would make me more bearish on algorithmic progress, and 6 would make me have way longer timelines than I have now, unless governments fund a massive AI effort.
It’s not that “they” should be more precise, but that “we” would like to have more precise information.
We know pretty conclusively now from The Information and Bloomberg that for OpenAI, Google and Anthropic, new frontier base LLMs have yielded disappointing performance gains. The question is which of your possibilities did cause this.
They do mention that the availability of high quality training data (text) is an issue, which suggests it’s probably not your first bullet point.
“Scaling breaks down”, they say. By which they mean one of the following wildly different claims with wildly different implications:
When you train on a normal dataset, with more compute/data/parameters, subtract the irreducible entropy from the loss, and then plot in a log-log plot: you don’t see a straight line anymore.
Same setting as before, but you see a straight line; it’s just that downstream performance doesn’t improve .
Same setting as before, and downstream performance improves, but: it improves so slowly that the economics is not in favor of further scaling this type of setup instead of doing something else.
A combination of one of the last three items and “btw., we used synthetic data and/or other more high-quality data, still didn’t help”.
Nothing in the realm of “pretrained models” and “reasoning models like o1″ and “agentic models like Claude with computer use” profits from a scale-up in a reasonable sense.
Nothing which can be scaled up in the next 2-3 years, when training clusters are mostly locked in, will demonstrate a big enough success to motivate the next scale of clusters costing around $100 billion.
Be precise. See also.
This is a just ask.
Also, even though it’s not locally rhetorically convenient [ where making an isolated demand for rigor of people making claims like “scaling has hit a wall [therefore AI risk is far]” that are inconvenient for AInotkilleveryoneism, is locally rhetorically convenient for us ], we should demand the same specificity of people who are claiming that “scaling works”, so we end up with a correct world-model and so people who just want to build AGI see that we are fair.
On the question of how much evidence the following scenarios are against the AI scaling thesis (which I roughly take to mean that more FLOPs and compute/data reliably makes AI better for economically important relevant jobs), I’d say that scenarios 4-6 falsify the hypothesis, while 3 is the strongest evidence against the hypothesis, followed by 2 and 1.
4 would make me more willing to buy algorithmic progress as important, 5 would make me more bearish on algorithmic progress, and 6 would make me have way longer timelines than I have now, unless governments fund a massive AI effort.
It’s not that “they” should be more precise, but that “we” would like to have more precise information.
We know pretty conclusively now from The Information and Bloomberg that for OpenAI, Google and Anthropic, new frontier base LLMs have yielded disappointing performance gains. The question is which of your possibilities did cause this.
They do mention that the availability of high quality training data (text) is an issue, which suggests it’s probably not your first bullet point.
I think the evidence mostly points towards 3+4,
But if 3 is due to 1 it would have bigger implications about 6 and probably also 5.
And there must be a whole bunch of people out there who know wether the curves bend.