These are mostly combinations of a bunch of lower-confidence arguments, which makes them difficult to expand a little. Nevertheless, I shall try.
1. I remain unconvinced of prompt exponential takeoff of an AI.
...assuming we aren’t in Algorithmica[1][2]. This is a load-bearing assumption, and most of my downstream probabilities are heavily governed by P(Algorithmica) as a result.
...because compilers have gotten slower over time at compiling themselves. ...because the optimum point for the fastest ‘compiler compiling itself’ is not to turn on all optimizations. ...because compiler output-program performance has somewhere between a 20[3]-50[4] year doubling time. ...because [growth rate of compiler output-program performance] / [growth rate of human time poured into compilers] is << 1[5]. ...because I think much of the advances in computational substrates[6] have been driven by exponentially rising investment[7], which in turn stretches other estimates by a factor of [investment growth rate] / [gdp growth rate]. ...because the cost of some atomic[8] components of fabs have been rising exponentially[9]. ...because the amount of labour put into CPUs has also risen significantly[10]. ...because R&D costs keep rising exponentially[11]. ...because cost-per-transistor is asymptotic towards a non-zero value[12]. ...because flash memory wafer-passes-per-layer is asymptotic towards a non-zero value[13]. ...because DRAM cost-per-bit has largely plateaued[14]. ...because hard drive areal density has largely plateaued[15]. ...because cost-per-wafer-pass is increasing. ...because of my industry knowledge. ...because single-threaded cpu performance has done this, and Amdahl’s Law is a thing[16]:
To drastically oversimplify: many of the exponential trends that people point to and go ‘this continues towards infinity/zero; there’s a knee here but that just changes the timeframe not the conclusion’ I look at and go ‘this appears to be consistent with a sub-exponential trend, such as an asymptote towards a finite value[18]‘. Ditto, many of the exponential trends that people point to and go ‘this shows that we are scaling’ rely on associated exponential trends that cannot continue (e.g. fab costs ‘should’ hit GDP by ~2080), with (seemingly) no good argument why the exponential trend will continue but the associated trend, under the same assumptions, won’t.
2. Most of the current discussion and arguments focus around agents that can do unbounded computation with zero cost.
...for instance, most formulations of Newcomb-like problems require either that the agent is bounded, or that the Omega does not exist, or that Omega violates the Church-Turing thesis[19]. ...for instance, most formulations of e.g. FairBot[20] will hang upon encountering an agent that will cooperate with an agent if and only if FairBot will not cooperate with that agent, or sometimes variations thereof. (The details depend on the exact formulation of FairBot.) ...and the unbounded case has interesting questions like ‘is while (true) {} a valid Bot’? (Or for (bigint i = 0; true ; i++){}; return (defect, cooperate)[i % 2];. Etc.) ...and in the ‘real world’ computation is never free. ...and heuristics are far more important in the bounded case. ...and many standard game-theory axioms do not hold in the bounded case. ...such as ‘you can never lower expected value by adding another option’. ...or ‘you can never lower expected value by raising the value of an option’. ...or ‘A ≻ B if and only if ApC ≻ BpC’[21].
Heuristica or Pessilandmay or may not also violate this assumption. “Problems we care about the answers of” are not random, and the question here is if they in practice end up hard or easy.
Proebsting’s Law is an observation that compilers roughly double the performance of the output program, all else being equal, with an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions.
Admittedly, I don’t have a good source for this. I debated about doing comparisons of GCC output-program performance and GCC commit history over time, but bootstrapping old versions of GCC is, uh, interesting.
I don’t actually know of a good term for ‘the portions of computers that are involved with actual computation’ (so CPU, DRAM, associated interconnects, compiler tech, etc, etc, but not e.g. human interface devices).
A quick fit of cost_per_transistor=ae−b∗billions_of_transistors_per_wafer+c on https://wccftech.com/apple-5nm-3nm-cost-transistors/ (billion transistors versus cost-per-transistor) gives cost_per_transistor=$15.16e−0.5127∗billions_of_transistors_per_wafer+$2.1729. This essentially indicates that cost per transistor has plateaued.
String stacking is a tact admission that you can’t really stack 3d nand indefinitely. So just stack multiple sets of layers… only that means you’re still doing O(layers) wafer passes. Admittedly, once I dug into it I found that Samsung says they can go up to ~1k layers, which is far more than I expected.
https://aiimpacts.org/trends-in-dram-price-per-gigabyte/ → “The price of a gigabyte of DRAM has fallen by about a factor of ten every 5 years from 1957 to 2020. Since 2010, the price has fallen much more slowly, at a rate that would yield an order of magnitude over roughly 14 years.” Of course, another way to phrase ‘exponential with longer timescale as time goes on’ is ‘not exponential’.
Admittedly, machine learning tends to be about the best-scaling workloads we have, as it tends to be mainly matrix operations and elementwise function application, but even so.
If you dig into the data behind [17] for instance, single-threaded performance is pretty much linear with time after ~2005. SpecInt(t)=5546(t−2003.5), with an R2>0.9. Linear increase in performance with exponential investment does not a FOOM make.
Roughly speaking: if the agent is Turing-complete, if you have an Omega I can submit an agent that runs said Omega on itself, and does whatever Omega says it won’t do. Options are: a) no such Omega can exist, or b) the agent is weaker than the Omega, because the agent is not Turing-complete, or c) the agent is weaker than the Omega, because the agent is Turing-complete but the Omega is super-Turing.
Wow, thank you so much! (And I apologize for the late reaction.) This is great, really.
Linear increase in performance with exponential investment does not a FOOM make.
Indeed. It seems FOOM might require a world in which processors can be made arbitrarily tiny.
I have been going over all your points and find them all very interesting. My current intuition on “recursive self-improvement” is that deep learning may be about the closest thing we can get, and that performance of those will asymptote relatively quickly when talking about general intelligence. As in, I don’t expect it’s impossible to have super-human deep learning systems, but I don’t expect with high probability that there will be an exponential trend smashing through the human level.
These are mostly combinations of a bunch of lower-confidence arguments, which makes them difficult to expand a little. Nevertheless, I shall try.
...assuming we aren’t in Algorithmica[1][2]. This is a load-bearing assumption, and most of my downstream probabilities are heavily governed by P(Algorithmica) as a result.
...because compilers have gotten slower over time at compiling themselves.
[17]...because the optimum point for the fastest ‘compiler compiling itself’ is not to turn on all optimizations.
...because compiler output-program performance has somewhere between a 20[3]-50[4] year doubling time.
...because [growth rate of compiler output-program performance] / [growth rate of human time poured into compilers] is << 1[5].
...because I think much of the advances in computational substrates[6] have been driven by exponentially rising investment[7], which in turn stretches other estimates by a factor of [investment growth rate] / [gdp growth rate].
...because the cost of some atomic[8] components of fabs have been rising exponentially[9].
...because the amount of labour put into CPUs has also risen significantly[10].
...because R&D costs keep rising exponentially[11].
...because cost-per-transistor is asymptotic towards a non-zero value[12].
...because flash memory wafer-passes-per-layer is asymptotic towards a non-zero value[13].
...because DRAM cost-per-bit has largely plateaued[14].
...because hard drive areal density has largely plateaued[15].
...because cost-per-wafer-pass is increasing.
...because of my industry knowledge.
...because single-threaded cpu performance has done this, and Amdahl’s Law is a thing[16]:
To drastically oversimplify: many of the exponential trends that people point to and go ‘this continues towards infinity/zero; there’s a knee here but that just changes the timeframe not the conclusion’ I look at and go ‘this appears to be consistent with a sub-exponential trend, such as an asymptote towards a finite value[18]‘. Ditto, many of the exponential trends that people point to and go ‘this shows that we are scaling’ rely on associated exponential trends that cannot continue (e.g. fab costs ‘should’ hit GDP by ~2080), with (seemingly) no good argument why the exponential trend will continue but the associated trend, under the same assumptions, won’t.
...for instance, most formulations of Newcomb-like problems require either that the agent is bounded, or that the Omega does not exist, or that Omega violates the Church-Turing thesis[19].
...for instance, most formulations of e.g. FairBot[20] will hang upon encountering an agent that will cooperate with an agent if and only if FairBot will not cooperate with that agent, or sometimes variations thereof. (The details depend on the exact formulation of FairBot.)
...and the unbounded case has interesting questions like ‘is
while (true) {}
a valid Bot’? (Orfor (bigint i = 0; true ; i++){}; return (defect, cooperate)[i % 2];
. Etc.)...and in the ‘real world’ computation is never free.
...and heuristics are far more important in the bounded case.
...and many standard game-theory axioms do not hold in the bounded case.
...such as ‘you can never lower expected value by adding another option’.
...or ‘you can never lower expected value by raising the value of an option’.
...or ‘A ≻ B if and only if ApC ≻ BpC’[21].
http://blog.computationalcomplexity.org/2004/06/impagliazzos-five-worlds.html—although note that there are also unstated assumptions that it’s a constructive proof and the result isn’t a galactic algorithm.
Heuristica or Pessiland may or may not also violate this assumption. “Problems we care about the answers of” are not random, and the question here is if they in practice end up hard or easy.
Proebsting’s Law is an observation that compilers roughly double the performance of the output program, all else being equal, with an 18-year doubling time. The 2001 reproduction suggested more like 20 years under optimistic assumptions.
A 2022 informal test showed a 10-15% improvement on average in the last 10 years, which is closer to a 50-year doubling time.
Admittedly, I don’t have a good source for this. I debated about doing comparisons of GCC output-program performance and GCC commit history over time, but bootstrapping old versions of GCC is, uh, interesting.
I don’t actually know of a good term for ‘the portions of computers that are involved with actual computation’ (so CPU, DRAM, associated interconnects, compiler tech, etc, etc, but not e.g. human interface devices).
Fab costs are rising exponentially, for one. See also https://www.lesswrong.com/posts/qnjDGitKxYaesbsem/a-comment-on-ajeya-cotra-s-draft-report-on-ai-timelines?commentId=omsXCgbxPkNtRtKiC
In the sense of indivisible.
Lithography machines in particular.
Intel had 21.9k employees in 1990, and as of 2020 had 121k. Rise of ~5.5x.
See e.g. https://cdn.wccftech.com/wp-content/uploads/2019/04/Screen-Shot-2019-04-19-at-7.41.50-PM-1480x781.png
A quick fit of cost_per_transistor=ae−b∗billions_of_transistors_per_wafer+c on https://wccftech.com/apple-5nm-3nm-cost-transistors/ (billion transistors versus cost-per-transistor) gives cost_per_transistor=$15.16e−0.5127∗billions_of_transistors_per_wafer+$2.1729. This essentially indicates that cost per transistor has plateaued.
String stacking is a tact admission that you can’t really stack 3d nand indefinitely. So just stack multiple sets of layers… only that means you’re still doing O(layers) wafer passes. Admittedly, once I dug into it I found that Samsung says they can go up to ~1k layers, which is far more than I expected.
https://aiimpacts.org/trends-in-dram-price-per-gigabyte/ → “The price of a gigabyte of DRAM has fallen by about a factor of ten every 5 years from 1957 to 2020. Since 2010, the price has fallen much more slowly, at a rate that would yield an order of magnitude over roughly 14 years.” Of course, another way to phrase ‘exponential with longer timescale as time goes on’ is ‘not exponential’.
Note distinction here between lab-attained and productized. Productized aeral density hasn’t really budged in the past 5 years ( https://www.storagenewsletter.com/2022/04/19/has-hdd-areal-density-stalled/ ) - with the potential exception of SMR which is a significant regression in other areas.
Admittedly, machine learning tends to be about the best-scaling workloads we have, as it tends to be mainly matrix operations and elementwise function application, but even so.
https://github.com/karlrupp/microprocessor-trend-data
If you dig into the data behind [17] for instance, single-threaded performance is pretty much linear with time after ~2005. SpecInt(t)=5546(t−2003.5), with an R2>0.9. Linear increase in performance with exponential investment does not a FOOM make.
Roughly speaking: if the agent is Turing-complete, if you have an Omega I can submit an agent that runs said Omega on itself, and does whatever Omega says it won’t do. Options are: a) no such Omega can exist, or b) the agent is weaker than the Omega, because the agent is not Turing-complete, or c) the agent is weaker than the Omega, because the agent is Turing-complete but the Omega is super-Turing.
FairBot has other issues too, e.g. https://www.lesswrong.com/posts/A5SgRACFyzjybwJYB/tlw-s-shortform?commentId=GrSr8aDtkLx8JHmnP
https://www.lesswrong.com/posts/AYSmTsRBchTdXFacS/on-expected-utility-part-3-vnm-separability-and-more?commentId=5DgQhNfzivzSdMf9o—when computation has a cost it may be better to flip a coin between two choices than to figure out which is better. This can then violate independence if the original choice is important enough to be worth calculating, but the probability is low enough that the probabilistic choice is not.
Wow, thank you so much! (And I apologize for the late reaction.) This is great, really.
Indeed. It seems FOOM might require a world in which processors can be made arbitrarily tiny.
I have been going over all your points and find them all very interesting. My current intuition on “recursive self-improvement” is that deep learning may be about the closest thing we can get, and that performance of those will asymptote relatively quickly when talking about general intelligence. As in, I don’t expect it’s impossible to have super-human deep learning systems, but I don’t expect with high probability that there will be an exponential trend smashing through the human level.