Thanks Jacob. I’ve been reading the back-and-forth between you and other commenters (not just habryka) in both this post and your brain efficiency writeup, and it’s confusing to me why some folks so confidently dismiss energy efficiency considerations with handwavy arguments not backed by BOTECs.
While I have your attention – do you have a view on how far we are from ops/J physical limits? Your analysis suggests we’re only 1-2 OOMs away from the ~10^-15 J/op limit, and if I’m not misapplying Koomey’s law (2x every 2.5y back in 2015, I’ll assume slowdown to 3y doubling by now) this suggests we’re only 10-20 years away, which sounds awfully near, albeit incidentally in the ballpark of most AGI timelines (yours, Metaculus etc).
TSMC 4N is a little over 1e10 transistors/cm^2 for GPUs and roughly 5e^-18 J switch energy assuming dense activity (little dark silicon). The practical transistor density limit with minimal few electron transistors is somewhere around ~5e11 trans/cm^2, but the minimal viable high speed switching energy is around ~2e^-18J. So there is another 1 to 2 OOM further density scaling, but less room for further switching energy reduction. Thus scaling past this point increasingly involves dark silicon or complex expensive cooling and thus diminishing returns either way.
Achieving 1e-15 J/flop seems doable now for low precision flops (fp4, perhaps fp8 with some tricks/tradeoffs); most of the cost is data movement as pulling even a single bit from RAM just 1 cm away costs around 1e-12J.
Thanks Jacob. I’ve been reading the back-and-forth between you and other commenters (not just habryka) in both this post and your brain efficiency writeup, and it’s confusing to me why some folks so confidently dismiss energy efficiency considerations with handwavy arguments not backed by BOTECs.
While I have your attention – do you have a view on how far we are from ops/J physical limits? Your analysis suggests we’re only 1-2 OOMs away from the ~10^-15 J/op limit, and if I’m not misapplying Koomey’s law (2x every 2.5y back in 2015, I’ll assume slowdown to 3y doubling by now) this suggests we’re only 10-20 years away, which sounds awfully near, albeit incidentally in the ballpark of most AGI timelines (yours, Metaculus etc).
TSMC 4N is a little over 1e10 transistors/cm^2 for GPUs and roughly 5e^-18 J switch energy assuming dense activity (little dark silicon). The practical transistor density limit with minimal few electron transistors is somewhere around ~5e11 trans/cm^2, but the minimal viable high speed switching energy is around ~2e^-18J. So there is another 1 to 2 OOM further density scaling, but less room for further switching energy reduction. Thus scaling past this point increasingly involves dark silicon or complex expensive cooling and thus diminishing returns either way.
Achieving 1e-15 J/flop seems doable now for low precision flops (fp4, perhaps fp8 with some tricks/tradeoffs); most of the cost is data movement as pulling even a single bit from RAM just 1 cm away costs around 1e-12J.