Jensen Huang/Nvidia is almost un-arguably one of TSMC’s most important clients and probably has some insights/access to their roadmaps, and I don’t particularly suspect he is lying when he claims Moore’s Law is dead, it matches my own analysis of TSMC’s public roadmap, as well my analysis of the industry research/chatter/gossip/analysis. Moore’s Law was a long recursive miniaturization optimization process which just was always naturally destined to bottom out somewhat before new on-moore’s law leading foundries cost sizable fractions of world GDP and features approach minimal sizes (well predicted in advance).
This obviously isn’t the end of technological progress in computing! It’s just the end of the easy era. Neuromorphic computing is much harder for comparatively small gains. Reversible computing seems almost impossibly difficult, such that many envision just jumping straight to quantum computing, which itself is no panacea and very far.
And this 2022 analysis suggests things were also going quite strong very recently,
As were chip clock frequencies under dennard scaling, until that suddenly ended. I have uncertainty over how far we are from minimal viable switch energies but it is not multiple OOM. There are more architectural tricks in the pipes in the nature of lower precision tensorcores, but not many of those left either.
Also for forecasting AI dynamics, flops/$ seems like it matters a lot more, since in the near future AI seems unlikely to have to care much about transistor density, given that there are easily 10-20 OOMs of energy and materials to be used on earth’s surface for some kind of semiconductor or neuromorphic compute production.
Sure but so far the many OOM improvement in flops/$ has been driven by shrinkage, not using ever more of earth’s resources to produce fabs & chips. Growing that way is very slow and absolutely in line with the smooth slow non foomy takeoff scenarios.
As were chip clock frequencies under dennard scaling, until that suddenly ended. I have uncertainty over how far we are from minimal viable switch energies but it is not multiple OOM. There are more architectural tricks in the pipes in the nature of lower precision tensorcores, but not many of those left either
Want to take a bet? $1000, even odds.
I predict flops/$ to continue going down at between a factor of 2x every 2 years and 2x every 3 years. Happy to have someone else be a referee on whether it holds up.
[Edit: Actually, to avoid having to condition on a fast takeoff itself, let’s say “going down faster than a factor of 2x every 3 years for the next 6 years”]
I may be up for that but we need to first define ‘flops’, acceptable GPUs/products, how to calculate prices (preferably some standard rental price with power cost), and finally the bet implementation.
Part of the issue is my post/comment was about moore’s law (transistor density for mass produced nodes), which is a major input to but distinct from flops/$. As I mentioned somewhere, there is still some free optimization energy in extracting more flops/$ at the circuit level even if moore’s law ends. Moore’s law is very specifically about fab efficiency as measured in transistors/cm^2 for large chip runs—not the flops/$ habyrka wanted to bet on. Even when moore’s law is over, I expect some continued progress in flops/$.
All that being said, nvidia’s new flagship GPU everyone is using—the H100 which is replacing the A100 and launched just a bit after habryka proposed the bet—actually offers near zero improvement in flops/$ (the price increased in direct proportion to flops increase). So I probably should have taken the bet if it was narrowly defined as (flops/$ for the flagship gpus most teams using currently for training foundation models).
Thanks Jacob. I’ve been reading the back-and-forth between you and other commenters (not just habryka) in both this post and your brain efficiency writeup, and it’s confusing to me why some folks so confidently dismiss energy efficiency considerations with handwavy arguments not backed by BOTECs.
While I have your attention – do you have a view on how far we are from ops/J physical limits? Your analysis suggests we’re only 1-2 OOMs away from the ~10^-15 J/op limit, and if I’m not misapplying Koomey’s law (2x every 2.5y back in 2015, I’ll assume slowdown to 3y doubling by now) this suggests we’re only 10-20 years away, which sounds awfully near, albeit incidentally in the ballpark of most AGI timelines (yours, Metaculus etc).
TSMC 4N is a little over 1e10 transistors/cm^2 for GPUs and roughly 5e^-18 J switch energy assuming dense activity (little dark silicon). The practical transistor density limit with minimal few electron transistors is somewhere around ~5e11 trans/cm^2, but the minimal viable high speed switching energy is around ~2e^-18J. So there is another 1 to 2 OOM further density scaling, but less room for further switching energy reduction. Thus scaling past this point increasingly involves dark silicon or complex expensive cooling and thus diminishing returns either way.
Achieving 1e-15 J/flop seems doable now for low precision flops (fp4, perhaps fp8 with some tricks/tradeoffs); most of the cost is data movement as pulling even a single bit from RAM just 1 cm away costs around 1e-12J.
Jensen Huang/Nvidia is almost un-arguably one of TSMC’s most important clients and probably has some insights/access to their roadmaps, and I don’t particularly suspect he is lying when he claims Moore’s Law is dead, it matches my own analysis of TSMC’s public roadmap, as well my analysis of the industry research/chatter/gossip/analysis. Moore’s Law was a long recursive miniaturization optimization process which just was always naturally destined to bottom out somewhat before new on-moore’s law leading foundries cost sizable fractions of world GDP and features approach minimal sizes (well predicted in advance).
This obviously isn’t the end of technological progress in computing! It’s just the end of the easy era. Neuromorphic computing is much harder for comparatively small gains. Reversible computing seems almost impossibly difficult, such that many envision just jumping straight to quantum computing, which itself is no panacea and very far.
As were chip clock frequencies under dennard scaling, until that suddenly ended. I have uncertainty over how far we are from minimal viable switch energies but it is not multiple OOM. There are more architectural tricks in the pipes in the nature of lower precision tensorcores, but not many of those left either.
Sure but so far the many OOM improvement in flops/$ has been driven by shrinkage, not using ever more of earth’s resources to produce fabs & chips. Growing that way is very slow and absolutely in line with the smooth slow non foomy takeoff scenarios.
Want to take a bet? $1000, even odds.
I predict flops/$ to continue going down at between a factor of 2x every 2 years and 2x every 3 years. Happy to have someone else be a referee on whether it holds up.
[Edit: Actually, to avoid having to condition on a fast takeoff itself, let’s say “going down faster than a factor of 2x every 3 years for the next 6 years”]
I may be up for that but we need to first define ‘flops’, acceptable GPUs/products, how to calculate prices (preferably some standard rental price with power cost), and finally the bet implementation.
Curious, did this bet happen? Since Jacob said he may be up for it depending on various specifics.
Part of the issue is my post/comment was about moore’s law (transistor density for mass produced nodes), which is a major input to but distinct from flops/$. As I mentioned somewhere, there is still some free optimization energy in extracting more flops/$ at the circuit level even if moore’s law ends. Moore’s law is very specifically about fab efficiency as measured in transistors/cm^2 for large chip runs—not the flops/$ habyrka wanted to bet on. Even when moore’s law is over, I expect some continued progress in flops/$.
All that being said, nvidia’s new flagship GPU everyone is using—the H100 which is replacing the A100 and launched just a bit after habryka proposed the bet—actually offers near zero improvement in flops/$ (the price increased in direct proportion to flops increase). So I probably should have taken the bet if it was narrowly defined as (flops/$ for the flagship gpus most teams using currently for training foundation models).
Thanks Jacob. I’ve been reading the back-and-forth between you and other commenters (not just habryka) in both this post and your brain efficiency writeup, and it’s confusing to me why some folks so confidently dismiss energy efficiency considerations with handwavy arguments not backed by BOTECs.
While I have your attention – do you have a view on how far we are from ops/J physical limits? Your analysis suggests we’re only 1-2 OOMs away from the ~10^-15 J/op limit, and if I’m not misapplying Koomey’s law (2x every 2.5y back in 2015, I’ll assume slowdown to 3y doubling by now) this suggests we’re only 10-20 years away, which sounds awfully near, albeit incidentally in the ballpark of most AGI timelines (yours, Metaculus etc).
TSMC 4N is a little over 1e10 transistors/cm^2 for GPUs and roughly 5e^-18 J switch energy assuming dense activity (little dark silicon). The practical transistor density limit with minimal few electron transistors is somewhere around ~5e11 trans/cm^2, but the minimal viable high speed switching energy is around ~2e^-18J. So there is another 1 to 2 OOM further density scaling, but less room for further switching energy reduction. Thus scaling past this point increasingly involves dark silicon or complex expensive cooling and thus diminishing returns either way.
Achieving 1e-15 J/flop seems doable now for low precision flops (fp4, perhaps fp8 with some tricks/tradeoffs); most of the cost is data movement as pulling even a single bit from RAM just 1 cm away costs around 1e-12J.
It did not