@OpenAI excels at placing big bets on ambitious research directions driven by strong conviction.
This is on the scale of the Apollo Program and Manhattan Project when measured as a fraction of GDP. This kind of investment only happens when the science is carefully vetted and people believe it will succeed and be completely transformative. I agree it’s the right time.
...I don’t think that’s the correct interpretation. DeepSeek shows you can get very powerful AI models with relatively little compute. But I have no doubt that with even more compute it would be an even more powerful model.
If r1 being comparable to o1 surprised you, your mistake was forgetting the 1 part. This is the early stage of a new paradigm, and SOTA is the cheapest it will ever be.
That does NOT mean compute doesn’t matter. (I’ve said roughly this before, but it bears repeating)
...Don’t get me wrong, DeepSeek is nothing to sneeze it.
They will almost certainly get much more compute than they have now. But so will OpenAI...
And if DeepSeek keeps up via compute, that does not invalidate the original point re: compute being key.
(This is an example of why I don’t expect DeepSeek to leapfrog OA/A/G/FB/xAI/SSI/et al: DS does great work, but $500b is a lot of money, and their capital disadvantage may be, if anything, bigger when you move from a raw parameter/data-scaling regime to an inference/search scaling regime. 6 million dollar training budgets aren’t cool. You know what’s cool? 6 million GPU training budgets...)
An important update: “Stargate” (blog) is now officially public, confirming earlier $100b numbers and some loose talk about ‘up to $500b’ being spent. Noam Brown commentary:
Miles Brundage:
(This is an example of why I don’t expect DeepSeek to leapfrog OA/A/G/FB/xAI/SSI/et al: DS does great work, but $500b is a lot of money, and their capital disadvantage may be, if anything, bigger when you move from a raw parameter/data-scaling regime to an inference/search scaling regime. 6 million dollar training budgets aren’t cool. You know what’s cool? 6 million GPU training budgets...)
EDIT: the lead author, Daya Guo, of the r1 paper reportedly tweeted (before deleting):