Re: OpenAI’s compute, I inferred from this NYT article that their $8.7B costs this year were likely to include about $6B in compute costs, which implies an average use of ~274k H100s throughout the year[1] (assuming $2.50/hr average H100 rental price). Assuming this was their annual average, I would’ve guessed they’d be on track to be using around 400k H100s by now.
So the 150k H100s campus in Phoenix might be only a small fraction of the total compute they have access to? Does this sound plausible?
The co-location of the Trainium2 cluster might give Anthropic a short term advantage, though I think its actually quite unclear if their networking and topology will fully enable this advantage. Perhaps the OpenAI Phoenix campus is well-connected enough to another OpenAI campus to be doing a 2-campus asynchronous training run effectively.
Thanks Vladimir, this is really interesting!
Re: OpenAI’s compute, I inferred from this NYT article that their $8.7B costs this year were likely to include about $6B in compute costs, which implies an average use of ~274k H100s throughout the year[1] (assuming $2.50/hr average H100 rental price). Assuming this was their annual average, I would’ve guessed they’d be on track to be using around 400k H100s by now.
So the 150k H100s campus in Phoenix might be only a small fraction of the total compute they have access to? Does this sound plausible?
The co-location of the Trainium2 cluster might give Anthropic a short term advantage, though I think its actually quite unclear if their networking and topology will fully enable this advantage. Perhaps the OpenAI Phoenix campus is well-connected enough to another OpenAI campus to be doing a 2-campus asynchronous training run effectively.
$6e9 / 365.25d / 24h / $2.5/hr = 274k