Vladimir_Nesov comments on mesaoptimizer’s Shortform

Vladimir_Nesov 30 Jun 2024 18:26 UTC
2 points
0
Collections of datacenter campuses sufficiently connected by appropriate fiber optic probably should count as one entity for purposes of estimating training potential, even in the current synchronous training paradigm. My impression is that laying such fiber optic is both significantly easier and significantly cheaper than building power plants or setting up power transmission over long distances in the multi-GW range.

Thus for training 3M GPUs/6GW scale models ($100 billion in infrastructure, $10 billion in cost of training time), hyperscalers “only” need to upgrade the equipment and arrange for “merely” on the order of 1GW in power consumption at multiple individual datacenter campuses connected to each other, while everyone else is completely out of luck. This hypothetical advantage makes collections of datacenter campuses an important unit of measurement, and also it would be nice to have a more informed refutation or confirmation that this is a real thing.