They’ve probably scaled up 2x-4x compared to the previous scale of about 8e25 FLOPs, it’s not that far (from 30K H100 to 100K H100). One point as I mentioned in the post is inability to reduce minibatch size, which might make this scaling step even less impactful than it should be judging from compute alone, though that doesn’t apply to Google.
In any case this doesn’t matter yet, since the 1 GW training systems are already being built (in case of Nvidia GPUs with larger scale-up worlds of GB200 NVL72), the decision to proceed to the yet-unobserved next level of scaling doesn’t depend on what’s observed right now. The 1 GW training systems allow training up to about 5e27 FLOPs, about 60x[1] the currently deployed models, a more significant change. We’ll see its impact in late 2026.
The number of chips increases 5x from 100K H100 to 500K B200, and the new chips are 2.5x faster. If 1 GW systems are not yet expected to be quickly followed by larger systems, more time will be given to individual frontier model training runs, let’s say 1.5x more. And there was that 3x factor from 30K H100 to 100K H100.
They’ve probably scaled up 2x-4x compared to the previous scale of about 8e25 FLOPs, it’s not that far (from 30K H100 to 100K H100). One point as I mentioned in the post is inability to reduce minibatch size, which might make this scaling step even less impactful than it should be judging from compute alone, though that doesn’t apply to Google.
In any case this doesn’t matter yet, since the 1 GW training systems are already being built (in case of Nvidia GPUs with larger scale-up worlds of GB200 NVL72), the decision to proceed to the yet-unobserved next level of scaling doesn’t depend on what’s observed right now. The 1 GW training systems allow training up to about 5e27 FLOPs, about 60x[1] the currently deployed models, a more significant change. We’ll see its impact in late 2026.
The number of chips increases 5x from 100K H100 to 500K B200, and the new chips are 2.5x faster. If 1 GW systems are not yet expected to be quickly followed by larger systems, more time will be given to individual frontier model training runs, let’s say 1.5x more. And there was that 3x factor from 30K H100 to 100K H100.