There wasn’t much content to this article, except for the below quotes.
Sam Altman says
further progress will not come from making models bigger. “I think we’re at the end of the era where it’s going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We’ll make them better in other ways.” [...] Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them. [...] At MIT last week, Altman confirmed that his company is not currently developing GPT-5. “An earlier version of the letter claimed OpenAI is training GPT-5 right now,” he said. “We are not, and won’t for some time.”
Nick Frosst, a cofounder at Cohere, says
“There are lots of ways of making transformers way, way better and more useful, and lots of them don’t involve adding parameters to the model,” he says. Frosst says that new AI model designs, or architectures, and further tuning based on human feedback are promising directions that many researchers are already exploring.
In the Lex Fridman interview with Sam Altman, Sam said that they had to do “hundreds of complicated things”. Does this, together with the above quote, suggest Sam thinks transformers are running out of oomph? Is he, perhaps, pausing progress whilst we await the next breakthrough in deep learning?
Edit: Added in a relevant question. Give your probability for the first option.
Or vote at this tweet, if you like:
https://twitter.com/Algon_33/status/1648047440065449993
As far as I can tell, Sam is saying no to size. That does not mean saying no to compute, data, or scaling.
“Hundreds of complicated things” comment definitely can’t be interpreted to be against transformers, since “simply” scaling transformers fits the description perfectly. “Simply” scaling transformers involves things like writing a new compiler. It is simple in strategy, not in execution.
This seems to insinuate a cool down in scaling compute and Sam previously acknowledged that the data bottleneck was a real roadblock.
The poll appears to be asking two, opposite questions. I’m not clear on whether a 99% means it will be a transformer or whether it means something else is needed to get there?
Yeah, I messed up the question. But that’s why I said “Give your probaiblity for the first option” in the post.
Ah, I didn’t understand what “first option” meant either.
Maybe Sam knows a lot I don’t know but here are some reasons why I’m skeptical about the end of scaling large language models:
From scaling laws we know that more compute and data reliably lead to better performance and therefore scaling seems like a low-risk investment.
I’m not sure how much GPT-4 cost but GPT-3 only cost $5-10 million which isn’t much for large tech companies (e.g. Meta spends billions on the metaverse every year).
There are limits to how big and expensive supercomputers can be but I doubt we’re near them. I’ve heard that GPT-4 was trained on ~10,000 GPUs which is a lot but not an insane amount (~$300m worth of GPUs). If there were 100 GPUs/m^2, all 10,000 GPUs could fit in a 10m x 10m room. A model trained with millions of GPUs is not inconceivable and is probably technically and economically possible today.
Because scaling laws are power laws (x-axis is logarithmic and y-axis is linear), there are diminishing returns to resources like more compute but I doubt we’ve reached the point where the marginal cost of training larger models exceeds the marginal benefit. Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
I’m getting the same conclusions.
And this is in a world, where Google already announced that they’re going to build even bigger model of their own
We have to upgrade our cluster with a fresh batch of Nvidia gadgets.
I was chatting with a friend of mine who works in the AI space. He said that the big thing that got them to GPT-4 was the data set; which was basically the entire internet. But now that they’ve given it the entire internet, there’s no easy way for them to go further along that axis;; that the next big increase in capabilities would require a significantly different direction than “more text / more parameters / more compute”.
I’d have to disagree with this assessment. Ilya Sutskever recently said that they’ve not run out of data yet. They might some day, but not yet. And Epoch projects high-quality text data to run out in 2024, with all text data running out in 2040.
Maybe temporarily efficiency improvements will rule, but surely once the low and medium hanging fruit is exhausted parameter count will once again be ramped up, would bet just about anything on that
And of course if we believe efficiency is the way to go for the next few years, that should scare the shit out of us, it means that even putting all gpu manufacturers out of commission might not be enough to save us should it become obvious that a slowdown is needed
shutting down GPU production was never in the Overton window anyway. This makes little difference. Even if further scaling isnt needed most people cant afford the 100M spent on gpt4.