Leon Lang comments on Leon Lang’s Shortform

Leon Lang 25 Sep 2024 16:43 UTC
8 points
1
New Bloomberg article on data center buildouts pitched to the US government by OpenAI. Quotes:

- “the startup shared a document with government officials outlining the economic and national security benefits of building 5-gigawatt data centers in various US states, based on an analysis the company engaged with outside experts on. To put that in context, 5 gigawatts is roughly the equivalent of five nuclear reactors, or enough to power almost 3 million homes.”
- “Joe Dominguez, CEO of Constellation Energy Corp., said he has heard that Altman is talking about building 5 to 7 data centers that are each 5 gigawatts. “
- “John Ketchum, CEO of NextEra Energy Inc., said the clean-energy giant had received requests from some tech companies to find sites that can support 5 GW of demand, without naming any specific firms.”

Compare with the prediction by Leopold Aschenbrenner in situational awareness:

- “The trillion-dollar cluster—+4 OOMs from the GPT-4 cluster, the ~2030 training cluster on the current trend—will be a truly extraordinary effort. The 100GW of power it’ll require is equivalent to >20% of US electricity production”
- Vladimir_Nesov 25 Sep 2024 19:35 UTC
  17 points
  1
  Parent
  From $4 billion for a 150 megawatts cluster, I get 37 gigawatts for a $1 trillion cluster, or seven 5-gigawatts datacenters (if they solve geographically distributed training). Future GPUs will consume more power per GPU (though a transition to liquid cooling seems likely), but the corresponding fraction of the datacenter might also cost more. This is only a training system (other datacenters will be built for inference), and there is more than one player in this game, so the 100 gigawatts figure seems reasonable for this scenario.
  
  Current best deployed models are about 5e25 FLOPs (possibly up to 1e26 FLOPs), very recent 100K H100s scale systems can train models for about 5e26 FLOPs in a few months. Building datacenters for 1 gigawatt scale seems already in progress, plausibly the models from these will start arriving in 2026. If we assume B200s, that’s enough to 15x the FLOP/s compared to 100K H100s, for 7e27 FLOPs in a few months, which is 5 trillion active parameters models (at 50 tokens/parameter).
  
  The 5 gigawatts clusters seem more speculative for now, though o1-like post-training promises sufficient investment, once it’s demonstrated on top of 5e26+ FLOPs base models next year. That gets us to 5e28 FLOPs (assuming 30% FLOP/joule improvement over B200s). And then 35 gigawatts for 3e29 FLOPs, which might be 30 trillion active parameters models (at 60 tokens/parameter). This is 4 OOMs above the rumored 2e25 FLOPs of original GPT-4.
  
  If each step of this process takes 18-24 months, and currently we just cleared 150 megawatts, there are 3 more steps to get $1 trillion training systems built, that is 2029-2030. If o1-like post-training works very well on top of larger-scale base models and starts really automating jobs, the impossible challenge of building these giant training systems this fast will be confronted by the impossible pressure of that success.
  What links here?
  - Vladimir_Nesov's comment on COT Scaling implies slower takeoff speeds by Logan Zoellner (28 Sep 2024 22:20 UTC; 6 points)