Control over many datacenters is useful for coordinating a large training run, but otherwise it doesn’t mean you have to find a use for all of that compute all the time, since you could lease/sublease some for use by others (which at the level of datacenter buildings is probably not overly difficult technically, you don’t need to suddenly become a cloud provider yourself).
So the quesion is more about the global AI compute buildout not finding enough demand to pay for itself, rather than what happens with companies that build the datacenters or create the models, and whether these are the same companies. It’s not useful to let datacenters stay idle, even if that perfectly extends hardware’s lifespan (which seems to be several years), since progress in hardware means the time of current GPUs will be much less valuable in several years, plausibly 5x-10x less valuable. And TCO over a datacenter’s lifetime is only 10-20% higher than the initial capex. So in a slowdown timeline prices of GPU-time can drop all the way to maybe 20-30% of what they would need to be to pay for the initial capex, before the datacenters start going idle. This proportionally reduces cost of inference (and also of training).
Project Stargate is planning on spending 100 billion at first, 50 billion of which would be debt.
The Abilene site in 2026 only costs $22-35bn, and they’ve raised a similar amount for it recently, so the $100bn figure remains about as nebulous as the $500bn figure. For inference (where exclusive use of a giant training system in a single location is not necessary) they might keep using Azure, so there is probably no pressing need to build even more for now.
Though I think there’s unlikely to be an AI slowdown until at least late 2026, and they’ll need to plan to build more in 2027-2028, raising money for it in 2026, so it’s likely they’ll get to try to secure those $100bn even in the timeline where there’ll be an AI slowdown soon after.
This is evidence that fixing such issues even to a first approximation still takes at least many months and can’t be done faster, as o3 was already trained in some form by December[1], it’s been 4 months, and who knows how long it’ll take to actually fix. Since o3 is not larger than o1 and so releasing it doesn’t depend on securing additional hardware, plausibly the time to release was primarily determined by the difficulty of getting post-training in shape and fixing the lying (which is systematically beyond “hallucinations” on some types of queries).
If o3 is based on GPT-4.1′s base model, and the latter used pretraining knowledge distillation from GPT-4.5-base, it’s not obviously possible to do all that by the time of Dec 2024 announcement. Assuming GPT-4.5 was pretrained for 3-4 months since May 2024, the base model was done in Aug-Sep 2024, logits for the pretraining dataset for GPT-4.1 were collected by Sep-Oct 2024, and GPT-4.1 itself got pretrained by Nov-Dec 2024, with almost no margin for post-training.
The reasoning training would need to either be very fast or mostly SFT from traces of a GPT-4.5′s reasoning variant (which could start training in Sep 2024 and be done to some extent by Nov 2024). Both might be possible to do quickly R1-Zero style, so maybe this is not impossible given that o3-preview only needed to pass benchmarks and not be shown directly to anyone yet.