I’m also quite sympathetic to the idea that another AI winter is plausible, mostly based off compute and data limits. One trivial but frequently overlooked data point is that GPT-4 was released nearly three years after GPT-3. In contrast, GPT-3 was released around a year after GPT-2 which in turn was released less than a year after GPT-1. Despite hype around AI being larger than ever, there already has been a progress slowdown relative to 2017-2020.
That said, a big unknown is to what extent specialized hardware dedicated to AI can outperform Moore’s Law. Jensen Huang sure thinks it can:
So obviously, computing has advanced tremendously and the way that’s happened, of course, is a complete reinvention of how computers write software, the computer architecture of it, and the computer runs software. Every single layer from the chip to the system to the interconnect to the algorithms, all completely redesigned and so this way of doing full-stack computing as you projected out ten years, there’s no question in my mind, large language models and these very large language models will have an opportunity to improve by another factor of a million. It just it has to be full stack.
That said, the economy is absorbing AI much slower than it is progressing and even if frontier progress halts tomorrow, investment may still be buoyed by the diffusion of the current models. It’s hard to argue that current models aren’t powerful enough to have economic value and won’t get less expensive as time progresses, regardless of how the frontier moves.
I think the long gap between GPT-3 and GPT-4 can be explained by Chinchilla. That was the point where OpenAI realized their models were undertrained for their size, and switched focus from scaling to fine-tuning for a couple of years. InstructGPT, Codex, text-davinci-003, and GPT-3.5 were all released in this period.
You’re likely correct, but I’m not sure that’s relevant. For one, Chinchilla wasn’t announced until 2022, nearly two years after the release of GPT-3. So the slowdown is still apparent even if we assume OpenAI was nearly done training an undertrained GPT-4 (which I have seen no evidence of).
Moreover, the focus on efficiency itself is evidence of an approaching wall. Taking an example from the 20th century, machines got much more energy efficient after the 70s which is also when energy stopped getting cheaper. Why didn’t OpenAI pivot their attention to fine-tuning and efficiency after the release of GPT-2? Because GPT-2 was cheap to train and relied on a tiny fraction of all available data, sidelining their importance. Efficiency is typically a reaction to scarcity.
I’m also quite sympathetic to the idea that another AI winter is plausible, mostly based off compute and data limits. One trivial but frequently overlooked data point is that GPT-4 was released nearly three years after GPT-3. In contrast, GPT-3 was released around a year after GPT-2 which in turn was released less than a year after GPT-1. Despite hype around AI being larger than ever, there already has been a progress slowdown relative to 2017-2020.
That said, a big unknown is to what extent specialized hardware dedicated to AI can outperform Moore’s Law. Jensen Huang sure thinks it can:
That said, the economy is absorbing AI much slower than it is progressing and even if frontier progress halts tomorrow, investment may still be buoyed by the diffusion of the current models. It’s hard to argue that current models aren’t powerful enough to have economic value and won’t get less expensive as time progresses, regardless of how the frontier moves.
I think the long gap between GPT-3 and GPT-4 can be explained by Chinchilla. That was the point where OpenAI realized their models were undertrained for their size, and switched focus from scaling to fine-tuning for a couple of years. InstructGPT, Codex, text-davinci-003, and GPT-3.5 were all released in this period.
You’re likely correct, but I’m not sure that’s relevant. For one, Chinchilla wasn’t announced until 2022, nearly two years after the release of GPT-3. So the slowdown is still apparent even if we assume OpenAI was nearly done training an undertrained GPT-4 (which I have seen no evidence of).
Moreover, the focus on efficiency itself is evidence of an approaching wall. Taking an example from the 20th century, machines got much more energy efficient after the 70s which is also when energy stopped getting cheaper. Why didn’t OpenAI pivot their attention to fine-tuning and efficiency after the release of GPT-2? Because GPT-2 was cheap to train and relied on a tiny fraction of all available data, sidelining their importance. Efficiency is typically a reaction to scarcity.