My current approximate understanding of fast, medium, slow takeoff times as roughly conceptualized by the AI alignment people arguing about them is:
Fast—a few hours up to a week. Maybe as long as a month.
Medium—a month to about a year and a half
Slow—a year to about 5 years
I personally place most of my probability mass on medium, but I don’t feel like I can rule out either fast or slow.
This is a tricky thing to define, because by some definitions we are already in the 5 year count-down on a slow takeoff. Important to note is that even during a slow takeoff, the pace of development is expected to accelerate throughout the window such that the end of the period will contain a disproportionately large amount of the progress. Also worth noting is that there is often a delay on progress being accurately measured, and a further delay on it being reported to the public. So looking at public reports will only ever give you a lagged perception of what is happening. This lag varies, but is often multiple months. Which means that even the medium takeoff could potentially complete before the public has even realized it has started.
The difficulty of defining the start of ‘true takeoff’ may mean that even afterwards some people might say, “this was a slow takeoff that had the expected fast bit right at the end” and others might say, “the true takeoff was just that fast bit right at the end, and that bit was short, so the takeoff was fast.”
Yes, I personally think that things are going to be moving much too fast for GDP to be a useful measure.
GDP requires some sort of integration into the economy. My experience in data science and ML engineering in industry, and also my time in academia, makes it very intuitive to me the lag time from developing something cool in the lab, to actually managing to publish a paper about it, to people in industry seeing the paper and deciding to reimplement it in production.
So if you have a lab which is testing it’s products internally, and the output is an improved product within that lab, which can then immediately be used for another cycle of improvement… That is clearly going to move much faster than you will see any effect on GDP. So GDP might help you measure a slow early start of a show takeoff, but it will be useless in the fast end section.
So if you have a lab which is testing it’s products internally, and the output is an improved product within that lab, which can then immediately be used for another cycle of improvement
Something jumped out at me here, please consider the below carefully.
What you’re saying is you have a self improvement cycle, where
Performance Error = F (test data)
F’ = Learning(Performance Error)
And then each cycle you substitute F for F’.
The assumption you made is that the size of the test data set is constant.
For some domains, like ordinary software today, it’s not constant—you keep having to raise the scale of your test benchmark. That is, if you find all the bugs that show up in 5 minutes, now you need to run your test benches twice as long for all the bugs in 10 minutes, then… Your test farm resources need to keep doubling, and this is why there are so many ‘obvious’ bugs that only show up when you release to millions of users.
Note also @Richard_Ngo ’s concept of an “n-second AGI”. Once you have a 10-second AGI, how much testing time is it going to take to self improve to a 20-second AGI? A 40 second AGI?
It keeps doubling, right, and you’re going to need 86,400 times as much training data you have to process to reach a 24-hour (in seconds) AGI if you have a 1 second AGI.
It may actually be worse than that because longer operation times have more degrees of freedom in the I/O.
This is also true for other kinds of processes, it’s measured empirically with https://en.wikipedia.org/wiki/Experience_curve_effects . The reason is slightly different and has to do with how to improve you are sampling a stochastic function from reality, and to gain knowledge at a constant rate you have to keep sampling it in larger volumes.
Anyways this nonlinear scaling for self improvement could mean that at later stages of AI development, the sheer volumes of compute and robotics required show up materially in GDP. That successful AI companies work like chip fabrication plants, needing customers to buy their prototypes to fund the next round of development.
My current approximate understanding of fast, medium, slow takeoff times as roughly conceptualized by the AI alignment people arguing about them is:
Fast—a few hours up to a week. Maybe as long as a month.
Medium—a month to about a year and a half
Slow—a year to about 5 years
I personally place most of my probability mass on medium, but I don’t feel like I can rule out either fast or slow.
This is a tricky thing to define, because by some definitions we are already in the 5 year count-down on a slow takeoff. Important to note is that even during a slow takeoff, the pace of development is expected to accelerate throughout the window such that the end of the period will contain a disproportionately large amount of the progress. Also worth noting is that there is often a delay on progress being accurately measured, and a further delay on it being reported to the public. So looking at public reports will only ever give you a lagged perception of what is happening. This lag varies, but is often multiple months. Which means that even the medium takeoff could potentially complete before the public has even realized it has started.
The difficulty of defining the start of ‘true takeoff’ may mean that even afterwards some people might say, “this was a slow takeoff that had the expected fast bit right at the end” and others might say, “the true takeoff was just that fast bit right at the end, and that bit was short, so the takeoff was fast.”
Some people advocate for using GDP, so the beginning is if you can see the AI signal in the noise (which we can’t yet).
Yes, I personally think that things are going to be moving much too fast for GDP to be a useful measure. GDP requires some sort of integration into the economy. My experience in data science and ML engineering in industry, and also my time in academia, makes it very intuitive to me the lag time from developing something cool in the lab, to actually managing to publish a paper about it, to people in industry seeing the paper and deciding to reimplement it in production. So if you have a lab which is testing it’s products internally, and the output is an improved product within that lab, which can then immediately be used for another cycle of improvement… That is clearly going to move much faster than you will see any effect on GDP. So GDP might help you measure a slow early start of a show takeoff, but it will be useless in the fast end section.
Something jumped out at me here, please consider the below carefully.
What you’re saying is you have a self improvement cycle, where
Performance Error = F (test data)
F’ = Learning(Performance Error)
And then each cycle you substitute F for F’.
The assumption you made is that the size of the test data set is constant.
For some domains, like ordinary software today, it’s not constant—you keep having to raise the scale of your test benchmark. That is, if you find all the bugs that show up in 5 minutes, now you need to run your test benches twice as long for all the bugs in 10 minutes, then… Your test farm resources need to keep doubling, and this is why there are so many ‘obvious’ bugs that only show up when you release to millions of users.
Note also @Richard_Ngo ’s concept of an “n-second AGI”. Once you have a 10-second AGI, how much testing time is it going to take to self improve to a 20-second AGI? A 40 second AGI?
It keeps doubling, right, and you’re going to need 86,400 times as much training data you have to process to reach a 24-hour (in seconds) AGI if you have a 1 second AGI.
It may actually be worse than that because longer operation times have more degrees of freedom in the I/O.
This is also true for other kinds of processes, it’s measured empirically with https://en.wikipedia.org/wiki/Experience_curve_effects . The reason is slightly different and has to do with how to improve you are sampling a stochastic function from reality, and to gain knowledge at a constant rate you have to keep sampling it in larger volumes.
Anyways this nonlinear scaling for self improvement could mean that at later stages of AI development, the sheer volumes of compute and robotics required show up materially in GDP. That successful AI companies work like chip fabrication plants, needing customers to buy their prototypes to fund the next round of development.