Dangerous Argument 2: We should avoid capability overhangs, so that people are not surprised. To do so, we should extract as many capabilities as possible from existing AI systems.
I’m saying that faster AI progress now tends to lead to slower AI progress later. I think this is a really strong bet, and the question is just: (i) quantitatively how large is the effect, (ii) how valuable is time now relative to time later. And on balance I’m not saying this makes progress net positive, just that it claws back a lot of the apparent costs.
For example, I think a very plausible view is that accelerating how good AI looks right now by 1 year will accelerate overall timelines by 0.5 years. It accelerates by less than 1 because there are other processes driving progress: gradually scaling up investment, compute progress, people training up in the field, other kinds of AI progress, people actually building products. Those processes can be accelerated by AI looking better, but they are obviously accelerated by less than 1 year (since they actually take time in addition to occurring at an increasing rate as AI improves).
I think that time later is significantly more valuable than time now (and time now is much more valuable than time in the old days). Safety investment and other kinds of adaptation increase greatly as the risks become more immediate (capabilities investment also increases, but that’s already included); safety research gets way more useful (I think most of the safety community’s work is 10x+ less valuable than work done closer to catastrophe, even if the average is lower than that). Having a longer period closer to the end seems really really good to me.
If we lose 1 year now, and get back 0.5 years later., and if years later are 2x as good as years now, you’d be breaking even.
My view is that accelerating progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3. If we had built GPT-3 in 2010, I think the world’s situation would probably have been better. We’d maybe be at our current capability level in 2018, scaling up further would be going more slowly because the community had already picked low hanging fruit and was doing bigger training runs, the world would have had more time to respond to the looming risk, and we would have done more good safety research.
At this point I think faster progress is probably net negative, but it’s still less bad than you would guess if you didn’t take this into account.
(It’s fairly likely that in retrospect I will say that’s wrong—that really good safety work didn’t start until we had systems where we could actually study safety risks, and that safety work in 2023 made so much less difference than time closer to the end such that faster progress in 2023 was actually net beneficial—but I agree that it’s hard to tell and if catastrophically risky AI is very soon then current progress can be significantly net negative.)
I do think that this effect differs for different kinds of research, and things like “don’t think of RLHF” or “don’t think of chain of thought” are particularly bad reasons for progress to be slow now (because it will get fixed particularly quickly later since it doesn’t have long lead times and in fact is pretty obvious, though you may just skip straight to more sophisticated versions). But I’m happy to just set aside that claim and focus on the part where overhang just generally makes technical progress less bad.
My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.
I’m saying that faster AI progress now tends to lead to slower AI progress later.
My best guess is that this is true, but I think there are outside-view reasons to be cautious.
We have some preliminary, unpublished work[1] at AI Impacts trying to distinguish between two kinds of progress dynamics for technology:
There’s an underlying progress trend, which only depends on time, and the technologies we see are sampled from a distribution that evolves according to this trend. A simple version of this might be that the goodness G we see for AI at time t is drawn from a normal distribution centered on Gc(t) = G0exp(At). This means that, apart from how it affects our estimate for G0, A, and the width of the distribution, our best guess for what we’ll see in the non-immediate future does not depend on what we see now.
There’s no underlying trend “guiding” progress. Advances happen at random times and improve the goodness by random amounts. A simple version of this might be a small probability per day that an advancement occurs, which is then independently sampled from a distribution of sizes. The main distinction here is that seeing a large advance at time t0 does decrease our estimate for the time at which enough advances have accumulated to reach goodness level G_agi.
(A third hypothesis, of slightly lower crudeness level, is that advances are drawn without replacement from a population. Maybe the probability per time depends on the size of remaining population. This is closer to my best guess at how the world actually works, but we were trying to model progress in data that was not slowing down, so we didn’t look at this.)
Obviously neither of these models describes reality, but we might be able to find evidence about which one is less of a departure from reality.
When we looked at data for advances in AI and other technologies, we did not find evidence that the fractional size of advance was independent of time since the start of the trend or since the last advance. In other words, in seems to be the case that a large advance at time t0 has no effect on the (fractional) rate of progress at later times.
Some caveats:
This work is super preliminary, our dataset is limited in size and probably incomplete, and we did not do any remotely rigorous statistics.
This was motivated by progress trends that mostly tracked an exponential, so progress that approaches the inflection point of an S-cure might behave differently
These hypotheses were not chosen in any way more principled than “it seems like many people have implicit models like this” and “this seems relatively easy to check, given the data we have”
Also, I asked Bing Chat about this yesterday and it gave me some economics papers that, at a glance, seem much better than what I’ve been able to find previously. So my views on this might change.
It’s unpublished because it’s super preliminary and I haven’t been putting more work into it because my impression was that this wasn’t cruxy enough to be worth the effort. I’d be interested to know if this seems important to others.
We’d maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.
It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.
I’m saying that faster AI progress now tends to lead to slower AI progress later. I think this is a really strong bet, and the question is just: (i) quantitatively how large is the effect, (ii) how valuable is time now relative to time later. And on balance I’m not saying this makes progress net positive, just that it claws back a lot of the apparent costs.
For example, I think a very plausible view is that accelerating how good AI looks right now by 1 year will accelerate overall timelines by 0.5 years. It accelerates by less than 1 because there are other processes driving progress: gradually scaling up investment, compute progress, people training up in the field, other kinds of AI progress, people actually building products. Those processes can be accelerated by AI looking better, but they are obviously accelerated by less than 1 year (since they actually take time in addition to occurring at an increasing rate as AI improves).
I think that time later is significantly more valuable than time now (and time now is much more valuable than time in the old days). Safety investment and other kinds of adaptation increase greatly as the risks become more immediate (capabilities investment also increases, but that’s already included); safety research gets way more useful (I think most of the safety community’s work is 10x+ less valuable than work done closer to catastrophe, even if the average is lower than that). Having a longer period closer to the end seems really really good to me.
If we lose 1 year now, and get back 0.5 years later., and if years later are 2x as good as years now, you’d be breaking even.
My view is that accelerating progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3. If we had built GPT-3 in 2010, I think the world’s situation would probably have been better. We’d maybe be at our current capability level in 2018, scaling up further would be going more slowly because the community had already picked low hanging fruit and was doing bigger training runs, the world would have had more time to respond to the looming risk, and we would have done more good safety research.
At this point I think faster progress is probably net negative, but it’s still less bad than you would guess if you didn’t take this into account.
(It’s fairly likely that in retrospect I will say that’s wrong—that really good safety work didn’t start until we had systems where we could actually study safety risks, and that safety work in 2023 made so much less difference than time closer to the end such that faster progress in 2023 was actually net beneficial—but I agree that it’s hard to tell and if catastrophically risky AI is very soon then current progress can be significantly net negative.)
I do think that this effect differs for different kinds of research, and things like “don’t think of RLHF” or “don’t think of chain of thought” are particularly bad reasons for progress to be slow now (because it will get fixed particularly quickly later since it doesn’t have long lead times and in fact is pretty obvious, though you may just skip straight to more sophisticated versions). But I’m happy to just set aside that claim and focus on the part where overhang just generally makes technical progress less bad.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.
My best guess is that this is true, but I think there are outside-view reasons to be cautious.
We have some preliminary, unpublished work[1] at AI Impacts trying to distinguish between two kinds of progress dynamics for technology:
There’s an underlying progress trend, which only depends on time, and the technologies we see are sampled from a distribution that evolves according to this trend. A simple version of this might be that the goodness G we see for AI at time t is drawn from a normal distribution centered on Gc(t) = G0exp(At). This means that, apart from how it affects our estimate for G0, A, and the width of the distribution, our best guess for what we’ll see in the non-immediate future does not depend on what we see now.
There’s no underlying trend “guiding” progress. Advances happen at random times and improve the goodness by random amounts. A simple version of this might be a small probability per day that an advancement occurs, which is then independently sampled from a distribution of sizes. The main distinction here is that seeing a large advance at time t0 does decrease our estimate for the time at which enough advances have accumulated to reach goodness level G_agi.
(A third hypothesis, of slightly lower crudeness level, is that advances are drawn without replacement from a population. Maybe the probability per time depends on the size of remaining population. This is closer to my best guess at how the world actually works, but we were trying to model progress in data that was not slowing down, so we didn’t look at this.)
Obviously neither of these models describes reality, but we might be able to find evidence about which one is less of a departure from reality.
When we looked at data for advances in AI and other technologies, we did not find evidence that the fractional size of advance was independent of time since the start of the trend or since the last advance. In other words, in seems to be the case that a large advance at time t0 has no effect on the (fractional) rate of progress at later times.
Some caveats:
This work is super preliminary, our dataset is limited in size and probably incomplete, and we did not do any remotely rigorous statistics.
This was motivated by progress trends that mostly tracked an exponential, so progress that approaches the inflection point of an S-cure might behave differently
These hypotheses were not chosen in any way more principled than “it seems like many people have implicit models like this” and “this seems relatively easy to check, given the data we have”
Also, I asked Bing Chat about this yesterday and it gave me some economics papers that, at a glance, seem much better than what I’ve been able to find previously. So my views on this might change.
It’s unpublished because it’s super preliminary and I haven’t been putting more work into it because my impression was that this wasn’t cruxy enough to be worth the effort. I’d be interested to know if this seems important to others.
It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.