TLW comments on A comment on Ajeya Cotra’s draft report on AI timelines

TLW 24 Feb 2022 5:39 UTC
9 points
One other issue with extrapolating exponential compute trends (Moore’s Law, FLOP/$, etc): there’s a second law that many people don’t really know about. (Sometimes called Moore’s Second Law, sometimes called Rock’s Law.)
For some of these it’s entirely plausible that in fact much of our current trend is an artifact of us throwing a higher and higher percentage of our total economy at the problem, and sooner or later we’ll hit a rock wall.
The cost of a fab doubles every 4 years or so^[1]. The world economy, on the other hand, takes more than 4 years to double in size. Call it every ~25^[2] years.
TSMC’s fab 18 cost 17 billion and started operation in 2020. 2020 world GDP was ~85 trillion.
Crossover point is at ~2080, where we’d be spending the total world GDP on fabs^[3]. Beyond that the extrapolation would indicate that we’d be spending more than our total world GDP on fabs, which is nonsense.
Anything that uses an exponential compute trend for long-term extrapolations needs to justify why they think that e.g. Moore’s law holds but, under the same set of assumptions, Rock’s law doesn’t.
In order to retain a stable % of the overall economy being dedicated to building semiconductor fabs, we’d need to slow down by a factor of 6 or so. If it turns out that trend of a 3.5 year doubling time for FLOPs/$ was fab-driven, for instance, suddenly that doubling time becomes more like twenty years^[4].
1. ^
  You’ll see arguments for and against this number. E.g. https://www.mckinsey.com/~/media/McKinsey/Industries/Semiconductors/Our%20Insights/McKinsey%20on%20Semiconductors%20Issue%205%20-%20Winter%202015/McKinsey%20on%20Semiconductors%20Winter%202015.ashx showed 13% per year (~5.5y doubling time) from 2001-2014 (page 33).
2. ^
  https://www.oecd.org/economy/lookingto2060long-termglobalgrowthprospects.htm says ~3%/year out to 2060, and rule of 72 gives ~24 years doubling time.
3. ^
  Crossover point is likely well before then—we can’t put anywhere near 100% of world GDP into making fabs—but this is a decent upper bound.
4. ^
  And even then this assumes that our economy can also continue exponentially rising indefinitely, which is substantially less plausible for the next 240 years than it is for the next 40 years.
What links here?
- Convince me that humanity *isn’t* doomed by AGI by Yitz (15 Apr 2022 17:26 UTC; 60 points)
- TLW's comment on AI Performance on Human Tasks by Asher Ellis (4 Mar 2022 1:47 UTC; 3 points)
- Hoagy 24 Feb 2022 14:17 UTC
  2 points
  Parent
  One response to this is that although the cost of state-of-the-art fabs has gone up exponentially, the productivity of those fabs has gone up exponentially alongside them, meaning that the cost per chip has stayed quite consistent.
  One could imagine that this is the ideal tradeoff to make, when the cost of capital is feasible, but that there may be some trade-off towards lower cost, lower productivity fabs that are possible once the capital expenditure to continue Rock’s law becomes unsustainable.
  Now, for companies investing in new fabs, it would surely be preferably to have multiple smaller and cheaper fabs, so we can conclude that this tradeoff would not be competitive with the theoretical $T fabs. On this model we would see continued robust growth in transistor densities but not a continuation of Moore’s law at current pace
  There’s some interesting detail on the subject in these lectures (in fact I recommend all of them—the details of the manufacturing process are incredibly interesting):
  - TLW 25 Feb 2022 0:48 UTC
    2 points
    Parent
    One could imagine that this is the ideal tradeoff to make, when the cost of capital is feasible, but that there may be some trade-off towards lower cost, lower productivity fabs that are possible once the capital expenditure to continue Rock’s law becomes unsustainable.
    Economies of scale are a massive effect for a fab. Fabs already target the max productivity/cost point^[1]. Trading off lower productivity for lower cost is often doable—but the productivity/cost gets substantially worse.
    One response to this is that although the cost of state-of-the-art fabs has gone up exponentially, the productivity of those fabs has gone up exponentially alongside them,
    If last decade’s fab construction assuming the world economy went solely into fab production was 2 fabs producing X/year each, and this decade’s fab construction assuming the world economy went solely into fab production was 0.02 fabs producing 100X/year each, the total output still went down. You can’t make 0.02 fabs.
    You might be able to make 1 fab 50x cheaper than originally planned, but it’ll have substantially worse than 1/50th the throughput^[2]^[3].
    On this model we would see continued robust growth in transistor densities but not a continuation of Moore’s law at current pace
    Not if the increase in transistor densities is enabled by increased amounts of process knowledge gained from pouring more and more money into fabs, for instance.
    (Process knowledge feeding back into development and yields is one of the major drivers of large fabs, remember...)
    ^
    Or, in some cases, are already targeting suboptimal productivity/cost tradeoffs due to lack of funding or market.
    ^
    [Citation Needed], strictly speaking. That being said, this is in my domain.
    ^
    Note that, for instance, a modern cutting-edge fab might have 10 EUV machines.
    - Hoagy 25 Feb 2022 9:44 UTC
      2 points
      Parent
      Yeah I don’t think we disagree, all I meant to say is that there’s a key question of how much productivity/cost is lost when companies are forced to take ‘suboptimal’ (in a world of unlimited capital) tradeoffs. I agree it’s probably substantial, interested in any more detailed understanding/intuition you have on the tradeoff.
      Good point about the process knowledge being created slowing as money growth becomes sub-exponential.
      - TLW 26 Feb 2022 7:10 UTC
        4 points
        Parent
        Yeah I don’t think we disagree, all I meant to say is that there’s a key question of how much productivity/cost is lost when companies are forced to take ‘suboptimal’ (in a world of unlimited capital) tradeoffs.
        Fair.
        I agree it’s probably substantial, interested in any more detailed understanding/intuition you have on the tradeoff.
        I wish I could provide more references; it’s mostly proprietary unfortunately. This is in my industry. It’s not my area of expertise, but it’s close enough that I’ve got a fair amount of knowledge^[1].
        That being said, a fab isn’t just a thousand parts each of which you can either scale down by a factor of 2 or put in half as many of.
        It is instructive to look at ASML’s annual report: https://www.asml.com/en/investors/annual-report/2021:
        €^[2]6,284 million in EUV system sales in 2021, for 42 machines.
        That’s €^[2]150 million per machine.
        €^[2]4,959.6 million in DUV system sales in 2021, for 81 machines.
        That’s €^[2]61 million per machine.
        EUV versus DUV (NXE 3400C versus NXT 2050i):
        ~10x the power consumption per unit
        ~2.5x the cost per unit
        Some other interesting stats that aren’t relevant to my point:
        ~20x the energy use per wafer pass.
        ~40% wafer passes / year / unit
        The cost of a single unit is increasingly rapidly (2.5x from DUV to EUV, for instance), and we’re only 2.5 orders of magnitude from O(1) total machines sold per year. Once you hit that… we don’t know how to scale down an EUV machine, for many reasons^[3].
        There are alternatives^[4], but they are many orders of magnitude slower^[5]. We will have to move to alternatives sooner or later… but they all require a lot of R&D to catch up to photolithography.
        ^
        And it’s annoying because I’m the sort of person who will stop and check every step of the way that yes, X is public information.
        ^
        For some reason a lot of places seem to ignore that these reports are in €, not USD?
        ^
        For instance: EUV starts with hitting a tin droplet with a sufficiently powerful laser pulse to turn it into plasma which emits the EUV^[6]. You swap out for a weaker laser, you don’t get less throughput. You get no throughput.
        ^
        Electron-beam lithography, for instance, although it also is starting to have problems^[7].
        ^
        And in many cases are inherently so. E-beam is limited by the combination of a) a very high exposure required to avoid shot noise effects, and b) beam interaction (it’s a charged-particle beam. You run multiple beams too close together and they deflect each other.).
        ^
        Way oversimplified. “Molten tin droplets of around 25 microns in diameter are ejected from a generator. As they move, the droplets are hit first by a lower-intensity laser pulse. Then a more powerful laser pulse vaporizes and ionizes the flattened droplet to create a plasma that emits EUV light.” is a somewhat better explanation, also from said report.
        ^
        With classic photolithography you’re largely diffraction-limited. There are stopgap approaches to squeeze a little more blood out of the stone (multiple patterning, OPC, etc), but they very much have deminishing returns. Ultimately you have to reduce the wavelength. By the time you hit EUV you start running into problems: stochastic shot noise for one, and you’re generating secondary electrons that scatter surprisingly far for two. Unfortunately: two of the major factors limiting e-beam resolution are… stochastic shot noise and that it generates secondary electrons that scatter surprisingly far.
        Hoagy 26 Feb 2022 10:44 UTC
        1 point
        Parent
        Cheers for all the additional detail and care to explain the situation!
        My only other question would be—it seems the natural way to continue the scaling trend would be to learn to stack more and more transistors on top of each other, until a chip becomes a fully 3D object. I understand that the main obstacle at the moment is preventing them from overheating but I’ve no idea how soluble it is.
        Do you have a good understanding or reference for the feasibility of this direction?
        TLW 26 Feb 2022 19:10 UTC
        5 points
        Parent
        I understand that the main obstacle at the moment is preventing them from overheating but I’ve no idea how soluble it is.
        That’s a serious theoretical concern way down the line; it’s nowhere near the most pressing concern.
        The enemy is # of wafer passes necessary, and knock-on effects thereof. If your machine does a million wafer passes a year, it can produce 100k chips a year requiring 10 wafer passes each… or 1 chip requiring 1m wafer passes. (And meanwhile, if each pass has a 1% chance of causing a critical failure, your yield rate is ~90% in the former case, and $10^{- 4363} %$ or so in the latter case.)
        No-one has figured out how to do n vertical layers of transistors in sub-O(n) passes. The closest thing to that is 3D NAND, but even there it’s still O(n) passes. (There are some theoretical approaches to getting sublinear mask steps with 3d nand, but it still generally requires O(n) other steps.)
        (Relevant: https://thememoryguy.com/making-3d-nand-flash-animated-video/ , and his series on 3d nand in general ( https://thememoryguy.com/what-is-3d-nand-why-do-we-need-it-how-do-they-make-it/ ). )
        (And 3D nand is very much a best-case in a bunch of ways. It’s extremely regular compared to the transistors in the middle of e.g. an ALU, for instance. And doesn’t mind the penalty for running in a 40nm process. Etc.)
        (And even 3D NAND is hitting scaling limitations. “String stacking” is essentially a tact acknowledgement that you can’t move beyond ~128 layers or so, so ‘just’ put multiple stacks on top of each other on the chip… but this is again O(n) passes / layer on average, just with a lower constant.)
        As long as making multiple transistor layers is O(n) passes and time/pass is roughly a plateau, moving to multiple levels doesn’t actually help scaling, and meanwhile hurts yield.