Sama Says the Age of Giant AI Models is Already Over

AlgonApr 17, 2023, 6:36 PM

49 points

There wasn’t much content to this article, except for the below quotes.

Sam Altman says

further progress will not come from making models bigger. “I think we’re at the end of the era where it’s going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We’ll make them better in other ways.” [...] Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them. [...] At MIT last week, Altman confirmed that his company is not currently developing GPT-5. “An earlier version of the letter claimed OpenAI is training GPT-5 right now,” he said. “We are not, and won’t for some time.”

Nick Frosst, a cofounder at Cohere, says

“There are lots of ways of making transformers way, way better and more useful, and lots of them don’t involve adding parameters to the model,” he says. Frosst says that new AI model designs, or architectures, and further tuning based on human feedback are promising directions that many researchers are already exploring.

In the Lex Fridman interview with Sam Altman, Sam said that they had to do “hundreds of complicated things”. Does this, together with the above quote, suggest Sam thinks transformers are running out of oomph? Is he, perhaps, pausing progress whilst we await the next breakthrough in deep learning?

Edit: Added in a relevant question. Give your probability for the first option.

Will the next LLM, which is to GPT-4 what GPT-4 is to GPT-3, be a transformer? Or will something else be needed to get there?

Or vote at this tweet, if you like:
https://twitter.com/Algon_33/status/1648047440065449993

What links here?

Thane Ruthenis's comment on A Case for the Least Forgiving Take On Alignment by Thane Ruthenis (May 9, 2023, 3:37 AM; 3 points)

AlgonApr 17, 2023, 6:36 PM

49 points

12 comments1 min readLW link

sanxiyn Apr 17, 2023, 9:28 PM
39 points
17

As far as I can tell, Sam is saying no to size. That does not mean saying no to compute, data, or scaling.
“Hundreds of complicated things” comment definitely can’t be interpreted to be against transformers, since “simply” scaling transformers fits the description perfectly. “Simply” scaling transformers involves things like writing a new compiler. It is simple in strategy, not in execution.
- Lost Futures Apr 18, 2023, 3:41 AM
  6 points
  0
  Parent
  
  Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.
  This seems to insinuate a cool down in scaling compute and Sam previously acknowledged that the data bottleneck was a real roadblock.
  What links here?
  - Lost Futures's comment on AI #17: The Litany by Zvi (Jun 22, 2023, 11:05 PM; 1 point)
tgb Apr 17, 2023, 9:34 PM
3 points
0

The poll appears to be asking two, opposite questions. I’m not clear on whether a 99% means it will be a transformer or whether it means something else is needed to get there?
- Algon Apr 17, 2023, 9:37 PM
  1 point
  0
  Parent
  
  Yeah, I messed up the question. But that’s why I said “Give your probaiblity for the first option” in the post.
  - tgb Apr 18, 2023, 10:29 AM
    4 points
    0
    Parent
    
    Ah, I didn’t understand what “first option” meant either.
Stephen McAleese Apr 18, 2023, 9:55 PM
2 points
1

Maybe Sam knows a lot I don’t know but here are some reasons why I’m skeptical about the end of scaling large language models:
- From scaling laws we know that more compute and data reliably lead to better performance and therefore scaling seems like a low-risk investment.
- I’m not sure how much GPT-4 cost but GPT-3 only cost $5-10 million which isn’t much for large tech companies (e.g. Meta spends billions on the metaverse every year).
- There are limits to how big and expensive supercomputers can be but I doubt we’re near them. I’ve heard that GPT-4 was trained on ~10,000 GPUs which is a lot but not an insane amount (~$300m worth of GPUs). If there were 100 GPUs/m^2, all 10,000 GPUs could fit in a 10m x 10m room. A model trained with millions of GPUs is not inconceivable and is probably technically and economically possible today.
Because scaling laws are power laws (x-axis is logarithmic and y-axis is linear), there are diminishing returns to resources like more compute but I doubt we’ve reached the point where the marginal cost of training larger models exceeds the marginal benefit. Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
- IC Rainbow Apr 19, 2023, 8:22 AM
  1 point
  0
  Parent
  
  I’m getting the same conclusions.
  
  Think of a company like Google: building the biggest and best model is immensely valuable in a global, winner-takes-all market like search.
  
  And this is in a world, where Google already announced that they’re going to build even bigger model of their own
  
  We are not, and won’t for some* time.
  - We have to upgrade our cluster with a fresh batch of Nvidia gadgets.
gwd Apr 18, 2023, 12:58 PM
2 points
0

I was chatting with a friend of mine who works in the AI space. He said that the big thing that got them to GPT-4 was the data set; which was basically the entire internet. But now that they’ve given it the entire internet, there’s no easy way for them to go further along that axis;; that the next big increase in capabilities would require a significantly different direction than “more text / more parameters / more compute”.
- awg Apr 18, 2023, 4:51 PM
  6 points
  1
  Parent
  
  I’d have to disagree with this assessment. Ilya Sutskever recently said that they’ve not run out of data yet. They might some day, but not yet. And Epoch projects high-quality text data to run out in 2024, with all text data running out in 2040.
sludgepuddle Apr 18, 2023, 2:38 AM
1 point
0

Maybe temporarily efficiency improvements will rule, but surely once the low and medium hanging fruit is exhausted parameter count will once again be ramped up, would bet just about anything on that
- sludgepuddle Apr 18, 2023, 2:41 AM
  8 points
  7
  Parent
  
  And of course if we believe efficiency is the way to go for the next few years, that should scare the shit out of us, it means that even putting all gpu manufacturers out of commission might not be enough to save us should it become obvious that a slowdown is needed
  - Akram Choudhary Apr 18, 2023, 3:23 AM
    1 point
    −2
    Parent
    
    shutting down GPU production was never in the Overton window anyway. This makes little difference. Even if further scaling isnt needed most people cant afford the 100M spent on gpt4.