An interesting section in the appendices, a criticism of Ajeya Cotra’s “Forecasting Transformative AI with Biological Anchors”:
If you do a sensitivity analysis on the most important variable (how much Moore’s law will improve FLOPS/$), the output behavior doesn’t make any sense, e.g., Moore’s law running out of steam after “conventional” improvements give us a 144x improvement would give us a 34% chance of transformative AI (TAI) by 2100, a 144*6x increase gives a 52% chance, and a 144*600x increase gives a 66% chance (and with the predicted 60000x improvement, there’s a 78% chance), so the model is, at best, highly flawed unless you believe that going form a 144x improvement to a 144*6x improvement in computer cost gives almost as much increase in the probability of TAI as a 144*6x to 144*60000x improvement in computer cost.
The part about all of this that makes this fundamentally the same thing that the futurists here did is that the estimate of the FLOPS/$ which is instrumental for this prediction is pulled from thin air by someone who is not a deep expert in semiconductors, computer architecture, or a related field that might inform this estimate.
[...]
If you say that, based on your intuition, you think there’s some significant probability of TAI by 2100; 10% or 50% or 80% or whatever number you want, I’d say that sounds plausible but wouldn’t place any particular faith in the estimate. But if you take a model that produces nonsense results and then pick an arbitrary input to the model that you have no good intuition about to arrive at an 80% chance, you’ve basically picked a random number that happens to be 80%.
The claim that the probability goes from 34% → 52% from a 6x of compute does sound pretty weird! But I think it’s just based on a game of telephone and a complete misunderstanding.
I was initially confused where the number came from, then I saw the reference to Nostalgebraist’s post. They say that “Assume a 6x extra speedup, and you get a 52% chance. (Which is still pretty high, to be fair.) Assume no extra speedup, and also no speedup at all, just the same computers we have now, and you get a 34% chance … wait, what?!”
Nostalgebraist is saying that you move from 34% to 52% by moving from 1x and 144*6x—not by moving from 144x to 144*6x. That is, this if you increase compute by about 3 OOMs you increase the probability from 34% to 52%.
Similarly, if you increase probability by 14*6x to 144*60000x, or about 4OOMs, you increase probability from 52% to 78%.
So 3 OOMs is 18% and 4 OOMs is 26%, roughly proportional as you’d expect given the nature of the model. The report basically distributes TAI over 20 OOMs and so a 3 OOM increase covers about 3/20th of the range.
But if you take a model that produces nonsense results and then pick an arbitrary input to the model that you have no good intuition about to arrive at an 80% chance, you’ve basically picked a random number that happens to be 80%.
If you get a nonsensical number out of a model, I think it’s worth reflecting more on whether there was a misunderstanding.
Aside from this, calling “how far does Moore’s law go” the most important variable seems kind of overstated. The criticism is that 7 orders of magnitude in this parameter leads to a change from 34% to 78%. I agree that’s a significant difference, but 7 orders of magnitude is a lot of uncertainty in this parameter, and I don’t think that’s grounds for saying that it’s the number that drives the whole estimate. And even after 7 OOMs these estimates aren’t even that different in an action-relevant way—in particular this change doesn’t result in a similarly-dramatic change for your 10 year or 20 year timelines, and shifting your 100 year TAI probability from 55% to 78% is not a huge deal.
And aside from that, saying that the estimates for Moore’s law are arbitrary isn’t right. I think it’s totally fair that Ajeya isn’t an expert, but that doesn’t mean that things are totally unknown within 7 orders of magnitude. At the upper end things are pretty constrained by basic physics, at the lower end things are pretty constrained by normal technological extrapolation. There’s a ton of uncertainty left but it’s just not a big deal relative to the uncertainty about AI training.
The overall estimate is basically driven by the fact that a broad distribution over horizon lengths in the existing NN extrapolation gives you a similar range of estimates to the entire space from human lifetime to human evolution. So it’s very easy to squint and get a broad distribution with around 5% probability per OOM of compute (which is a couple percent per year right now). The criticism of this that seems most plausible to me is that maybe inside-view you can just eyeball how good AI systems are and how close they are to transformative effects and it’s just not that far. That said, the second most plausible criticism (especially about the 20%+ short-term predictions) is that you can eyeball how good AI systems are and it’s probably not that close.
(Disclaimer: this report is written by my wife and so I may be biased.)
FWIW I’m not married to Ajeya and I agree with you; I was pretty disappointed by Nostalgebraist’s post & how much positive reception it seemed to get. I’ve been thinking about writing up a rebuttal. Most of what I’d say is what you’ve already said here though, so yay.
An interesting section in the appendices, a criticism of Ajeya Cotra’s “Forecasting Transformative AI with Biological Anchors”:
The claim that the probability goes from 34% → 52% from a 6x of compute does sound pretty weird! But I think it’s just based on a game of telephone and a complete misunderstanding.
I was initially confused where the number came from, then I saw the reference to Nostalgebraist’s post. They say that “Assume a 6x extra speedup, and you get a 52% chance. (Which is still pretty high, to be fair.) Assume no extra speedup, and also no speedup at all, just the same computers we have now, and you get a 34% chance … wait, what?!”
Nostalgebraist is saying that you move from 34% to 52% by moving from 1x and 144*6x—not by moving from 144x to 144*6x. That is, this if you increase compute by about 3 OOMs you increase the probability from 34% to 52%.
Similarly, if you increase probability by 14*6x to 144*60000x, or about 4OOMs, you increase probability from 52% to 78%.
So 3 OOMs is 18% and 4 OOMs is 26%, roughly proportional as you’d expect given the nature of the model. The report basically distributes TAI over 20 OOMs and so a 3 OOM increase covers about 3/20th of the range.
If you get a nonsensical number out of a model, I think it’s worth reflecting more on whether there was a misunderstanding.
Aside from this, calling “how far does Moore’s law go” the most important variable seems kind of overstated. The criticism is that 7 orders of magnitude in this parameter leads to a change from 34% to 78%. I agree that’s a significant difference, but 7 orders of magnitude is a lot of uncertainty in this parameter, and I don’t think that’s grounds for saying that it’s the number that drives the whole estimate. And even after 7 OOMs these estimates aren’t even that different in an action-relevant way—in particular this change doesn’t result in a similarly-dramatic change for your 10 year or 20 year timelines, and shifting your 100 year TAI probability from 55% to 78% is not a huge deal.
And aside from that, saying that the estimates for Moore’s law are arbitrary isn’t right. I think it’s totally fair that Ajeya isn’t an expert, but that doesn’t mean that things are totally unknown within 7 orders of magnitude. At the upper end things are pretty constrained by basic physics, at the lower end things are pretty constrained by normal technological extrapolation. There’s a ton of uncertainty left but it’s just not a big deal relative to the uncertainty about AI training.
The overall estimate is basically driven by the fact that a broad distribution over horizon lengths in the existing NN extrapolation gives you a similar range of estimates to the entire space from human lifetime to human evolution. So it’s very easy to squint and get a broad distribution with around 5% probability per OOM of compute (which is a couple percent per year right now). The criticism of this that seems most plausible to me is that maybe inside-view you can just eyeball how good AI systems are and how close they are to transformative effects and it’s just not that far. That said, the second most plausible criticism (especially about the 20%+ short-term predictions) is that you can eyeball how good AI systems are and it’s probably not that close.
(Disclaimer: this report is written by my wife and so I may be biased.)
FWIW I’m not married to Ajeya and I agree with you; I was pretty disappointed by Nostalgebraist’s post & how much positive reception it seemed to get. I’ve been thinking about writing up a rebuttal. Most of what I’d say is what you’ve already said here though, so yay.
I pointed out the OOM error on Twitter (citing this comment), and Dan has updated the post with a correction.