That’s not exactly my claim. If he said more to the reporters than his words quoted in the article[1], then it might’ve been justified to interpret him as saying that pretraining is plateauing. The article isn’t clear on whether he said more. If he said nothing more, then the interpretation about plateauing doesn’t follow, but could in principle still be correct.
Another point is that Sutskever left OpenAI before they trained the first 100K H100s model, and in any case one datapoint of a single training run isn’t much evidence. The experiment that could convincingly demonstrate plateauing hasn’t been performed yet. Give it at least a few months, for multiple labs to try and fail.
“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”
I think this is a misunderstanding of the piece and how journalists typically paraphrase things. The reporters wrote that Ilya told them that results from scaling up pre-training have plateaued. So he probably said something to that effect, but for readability and word-count reasons, they paraphrased it.
If a reported story from a credible outlet says something like X told us that Y, then the reporters are sourcing claim Y to X, whether or not they include a direct quote.
The plateau claim also jives with The Information story about OpenAI, as well as a few other similar claims made by people in industry.
Ilya probably spoke to the reporter(s) for at least a few min, so the quotes you see are a tiny fraction of everything he said.
I just want to provide one important piece of information:
It turns out that Ilya Sutskever was misinterpreted as a claim about the model plateauing, but instead saying other directions work out better:
https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/?commentId=JFNZ5MGZnzKRtFFMu
That’s not exactly my claim. If he said more to the reporters than his words quoted in the article[1], then it might’ve been justified to interpret him as saying that pretraining is plateauing. The article isn’t clear on whether he said more. If he said nothing more, then the interpretation about plateauing doesn’t follow, but could in principle still be correct.
Another point is that Sutskever left OpenAI before they trained the first 100K H100s model, and in any case one datapoint of a single training run isn’t much evidence. The experiment that could convincingly demonstrate plateauing hasn’t been performed yet. Give it at least a few months, for multiple labs to try and fail.
I definitely agree that people are overupdating too much from this training run, and we will need to wait.
(I also made this mistake in overupdating.)
I think this is a misunderstanding of the piece and how journalists typically paraphrase things. The reporters wrote that Ilya told them that results from scaling up pre-training have plateaued. So he probably said something to that effect, but for readability and word-count reasons, they paraphrased it.
If a reported story from a credible outlet says something like X told us that Y, then the reporters are sourcing claim Y to X, whether or not they include a direct quote.
The plateau claim also jives with The Information story about OpenAI, as well as a few other similar claims made by people in industry.
Ilya probably spoke to the reporter(s) for at least a few min, so the quotes you see are a tiny fraction of everything he said.
Fair enough, I’ll retract my comment.