Michaël Trazzi comments on Forecasting Thread: AI Timelines

Michaël Trazzi 12 Jan 2022 14:57 UTC
LW: 3 AF: 3
AF
You have to do lots of software engineering and for 4+ OOMs you literally need to build more chip fabs to produce more chips.
I have probably missed many considerations you have mentioned elsewhere, but in terms of software engineering, how do you think the “software production rate” for scaling up large evolved from 2020 to late 2021? I don’t see why we couldn’t get 4 OOM between 2020 and 2025.
If we just take the example of large LM, we went from essentially 1-10 publicly known models in 2020, to 10-100 in 2021 (cf. China, Korea, Microsoft, DM, etc.), and I expect the amount of private models to be even higher, so it makes sense to me that we could have 4OOM more SWE in that area by 2025.
Now, for the chip fabs, I feel like one update from 2020 to 2022 has been NVIDIA & Apple doing unexpected hardware advances (A100, M1) and Nvidia stock growing massively, so I would be more optimistic about “build more fabs” than in 2020. Though I’mm not an expert in hardware at all and those two advances I mentioned were maybe not that useful for scaling.
- Daniel Kokotajlo 12 Jan 2022 17:08 UTC
  LW: 3 AF: 3
  AF Parent
  If I understand you correctly, you are asking something like: How many programmer-hours of effort and/or how much money was being spent specifically on scaling up large models in 2020? What about in 2025? Is the latter plausibly 4 OOMs more than the former? (You need some sort of arbitrary cutoff for what counts as large. Let’s say GPT-3 sized or bigger.)
  Yeah maybe, I don’t know! I wish I did. It’s totally plausible to me that it could be +4 OOMs in this metric by 2025. It’s certainly been growing fast, and prior to GPT-3 there may not have been much of it at all.
  - Michaël Trazzi 12 Jan 2022 18:19 UTC
    LW: 1 AF: 1
    AF Parent
    Yes, something like: given (programmer-hours-into-scaling(July 2020) - programmer-hours-into-scaling(Jan 2022)), and how much progress there has been on hardware for such training (I don’t know the right metric for this, but probably something to do with FLOP and parallelization), the extrapolation to 2025 (either linear or exponential) would give the 4 OOM you mentioned.