TLW comments on A comment on Ajeya Cotra’s draft report on AI timelines

TLW Feb 26, 2022, 7:10 AM
4 points
Yeah I don’t think we disagree, all I meant to say is that there’s a key question of how much productivity/cost is lost when companies are forced to take ‘suboptimal’ (in a world of unlimited capital) tradeoffs.
Fair.
I agree it’s probably substantial, interested in any more detailed understanding/intuition you have on the tradeoff.
I wish I could provide more references; it’s mostly proprietary unfortunately. This is in my industry. It’s not my area of expertise, but it’s close enough that I’ve got a fair amount of knowledge^[1].
That being said, a fab isn’t just a thousand parts each of which you can either scale down by a factor of 2 or put in half as many of.
It is instructive to look at ASML’s annual report: https://www.asml.com/en/investors/annual-report/2021:
1. €^[2]6,284 million in EUV system sales in 2021, for 42 machines.
  1. That’s €^[2]150 million per machine.
2. €^[2]4,959.6 million in DUV system sales in 2021, for 81 machines.
  1. That’s €^[2]61 million per machine.
3. EUV versus DUV (NXE 3400C versus NXT 2050i):
  1. ~10x the power consumption per unit
  2. ~2.5x the cost per unit
  3. Some other interesting stats that aren’t relevant to my point:
    ~20x the energy use per wafer pass.
    ~40% wafer passes / year / unit
The cost of a single unit is increasingly rapidly (2.5x from DUV to EUV, for instance), and we’re only 2.5 orders of magnitude from O(1) total machines sold per year. Once you hit that… we don’t know how to scale down an EUV machine, for many reasons^[3].
There are alternatives^[4], but they are many orders of magnitude slower^[5]. We will have to move to alternatives sooner or later… but they all require a lot of R&D to catch up to photolithography.
1. ^
  And it’s annoying because I’m the sort of person who will stop and check every step of the way that yes, X is public information.
2. ^
  For some reason a lot of places seem to ignore that these reports are in €, not USD?
3. ^
  For instance: EUV starts with hitting a tin droplet with a sufficiently powerful laser pulse to turn it into plasma which emits the EUV^[6]. You swap out for a weaker laser, you don’t get less throughput. You get no throughput.
4. ^
  Electron-beam lithography, for instance, although it also is starting to have problems^[7].
5. ^
  And in many cases are inherently so. E-beam is limited by the combination of a) a very high exposure required to avoid shot noise effects, and b) beam interaction (it’s a charged-particle beam. You run multiple beams too close together and they deflect each other.).
6. ^
  Way oversimplified. “Molten tin droplets of around 25 microns in diameter are ejected from a generator. As they move, the droplets are hit first by a lower-intensity laser pulse. Then a more powerful laser pulse vaporizes and ionizes the flattened droplet to create a plasma that emits EUV light.” is a somewhat better explanation, also from said report.
7. ^
  With classic photolithography you’re largely diffraction-limited. There are stopgap approaches to squeeze a little more blood out of the stone (multiple patterning, OPC, etc), but they very much have deminishing returns. Ultimately you have to reduce the wavelength. By the time you hit EUV you start running into problems: stochastic shot noise for one, and you’re generating secondary electrons that scatter surprisingly far for two. Unfortunately: two of the major factors limiting e-beam resolution are… stochastic shot noise and that it generates secondary electrons that scatter surprisingly far.
- Hoagy Feb 26, 2022, 10:44 AM
  1 point
  Parent
  Cheers for all the additional detail and care to explain the situation!
  My only other question would be—it seems the natural way to continue the scaling trend would be to learn to stack more and more transistors on top of each other, until a chip becomes a fully 3D object. I understand that the main obstacle at the moment is preventing them from overheating but I’ve no idea how soluble it is.
  Do you have a good understanding or reference for the feasibility of this direction?
  - TLW Feb 26, 2022, 7:10 PM
    5 points
    Parent
    I understand that the main obstacle at the moment is preventing them from overheating but I’ve no idea how soluble it is.
    That’s a serious theoretical concern way down the line; it’s nowhere near the most pressing concern.
    The enemy is # of wafer passes necessary, and knock-on effects thereof. If your machine does a million wafer passes a year, it can produce 100k chips a year requiring 10 wafer passes each… or 1 chip requiring 1m wafer passes. (And meanwhile, if each pass has a 1% chance of causing a critical failure, your yield rate is ~90% in the former case, and $10^{- 4363} %$ or so in the latter case.)
    No-one has figured out how to do n vertical layers of transistors in sub-O(n) passes. The closest thing to that is 3D NAND, but even there it’s still O(n) passes. (There are some theoretical approaches to getting sublinear mask steps with 3d nand, but it still generally requires O(n) other steps.)
    (Relevant: https://thememoryguy.com/making-3d-nand-flash-animated-video/ , and his series on 3d nand in general ( https://thememoryguy.com/what-is-3d-nand-why-do-we-need-it-how-do-they-make-it/ ). )
    (And 3D nand is very much a best-case in a bunch of ways. It’s extremely regular compared to the transistors in the middle of e.g. an ALU, for instance. And doesn’t mind the penalty for running in a 40nm process. Etc.)
    (And even 3D NAND is hitting scaling limitations. “String stacking” is essentially a tact acknowledgement that you can’t move beyond ~128 layers or so, so ‘just’ put multiple stacks on top of each other on the chip… but this is again O(n) passes / layer on average, just with a lower constant.)
    As long as making multiple transistor layers is O(n) passes and time/pass is roughly a plateau, moving to multiple levels doesn’t actually help scaling, and meanwhile hurts yield.