There are various different timelines/takeoff dynamics scenarios:
In this post, we’re going to set out our understanding of the case for short timelines and slow, continuous takeoff as the safest path to AGI.
A few reasons why laying out this case seems worth doing:
OpenAI’s strategy assumes that short timelines and slow, continuous takeoff are the safest path to AGI.[1] Whether or not this is the safest path, the fact that OpenAI thinks it’s true and is one of the leading AI labs makes it a path we’re likely to take. Humanity successfully navigating the transition to extremely powerful AI might therefore require successfully navigating a scenario with short timelines and slow, continuous takeoff.
Short timelines and slow, continuous takeoff might indeed be the safest path to AGI. After spending some time trying to state the strongest version of the case, we think it’s stronger than we initially expected (and than many in the AI safety community seem to assume).
If it’s not true that short timelines and slow, continuous takeoff are the safest path to AGI, it might be really important to convince people at OpenAI that this is the case. It seems unlikely that it’s possible to do this without deeply understanding their current view.
Note that Sam Altman usually refers just to slow takeoff rather than slow continuous takeoff, but in this piece we discuss slow continuous takeoff, because we think that it’s the strongest way of making this case and that it’s likely that this is what Altman means.
Disclaimers:
Neither of us work at OpenAI or are in touch with OpenAI leadership. This post represents our best guess charitable interpretation of the case. We’ve done a lot of filling in the blanks on our side in order to make the arguments as coherent as possible. Our ideal in writing this post is to pass an ideological Turing Test, where Sam Altman could read this post and think ‘yes, exactly, that’s why I think that’. Probably we fall short of this as we haven’t engaged with Sam directly.
Neither of us fully buys that short timelines and slow, continuous takeoff are the safest path to AGI. We’re writing this post to improve our (and hopefully other people’s) understanding of the position. In this post, we don’t give our overall evaluation of the case—just present what seem to us like the strongest arguments for it.
We hope this post will serve as a jumping off point for critiques which engage more deeply with OpenAI’s strategy. (And where the post doesn’t faithfully capture OpenAI views, we’d love to get comments pointing this out.)
Slow, continuous takeoff is safer than fast, discontinuous takeoff
To successfully navigate the transition to extremely powerful AI, we want AI safety and governance efforts to keep pace with AI capabilities (or ideally, to exceed them). When compared to fast discontinuous takeoff, slow continuous takeoff seems much safer from this perspective:
Fast takeoff doesn’t give society time to figure out how to respond to advanced capabilities and the destabilisation which could follow from them. It also doesn’t give us to coordinate once the danger is clearer and more imminent.
Discontinuous takeoff entails jumps where capabilities could suddenly exceed existing levels of safety/governance.
Figuring out how to make systems safe will likely depend on bootstrapping up safety using the last generation (e.g. using GPT-5 to align GPT-6). Even if takeoff is (relatively) fast, this makes continuity preferable to discontinuity, as continuity makes it more likely that bootstrapping alignment works.
While slow, continuous takeoff likely means more actors are involved and greater proliferation of powerful models (in comparison to one lab suddenly jumping far ahead of any others), this is partially mitigated if timelines are short. Short timelines make it more likely that leading actors remain concentrated in the West and among the current crop of top companies. These companies may be more likely to be able to coordinate with one another on safety than the group of companies/state actors who would replace them should the current leaders slow down.
Slow, continuous takeoff is more likely given short timelines than long ones
The two slow, continuous takeoff scenarios in Figure 1 above were:
The long timelines scenario looks safer: takeoff is just as continuous, we get just as much time during takeoff, and we also get more time before takeoff. It seems to strictly dominate the short timelines version.
In the abstract, we think this is true.
The argument for short timelines and slow, continuous takeoff as the safest path to AGI is that slow, continuous takeoff is more likely given short timelines—rather than the argument being that short timelines would be preferable to long timelines if you could actually get slow, continuous takeoff in both scenarios.
The strongest arguments that slow, continuous takeoff is more likely given short timelines are:
Coordination seems likely to be particularly easy now relative to later. Coordination allows us to move more slowly through takeoff, all else equal.[2]
Right now there are only a handful of frontier labs. Over time, the number of labs might increase, for instance if:
Commercial incentives get stronger.
Barriers to entry lower.
More labs catch up.
More labs splinter from existing labs.
Relatedly, currently the frontier labs are all Western. Coordination between frontier labs in China and the West seems likely to be harder than coordination between Western labs, for cultural and geopolitical/natsec reasons.
Most of the frontier labs recognise that AI risk is a thing and explicitly support AI safety work. It seems at least plausible that current labs are unusually cautious compared to labs we might see in future.[3]
AI development is compute intensive so it’s obvious who is doing it. Over sufficiently long timeframes, this might no longer hold.
It’s plausible that compute overhang is low now relative to later, and this tends towards slower, more continuous takeoff.[4]
There might be discontinuous increases in compute supply in future, via:
Advances in computing technologies.
Discontinuous increases in the capital available for compute.
Future regulation might create a compute overhang. E.g. a moratorium on large training runs directly translates to a larger compute overhang.
Time during takeoff is more valuable than time before, so it’s worth trading time now for time later
Both of the points we’ve made so far (that slow, continuous takeoff is safer; and that it’s more likely given short timelines) could be true without making short timelines and slow, continuous takeoff the safest path to AGI.
It could be the case that the safety gains from slow, continuous takeoff are dominated by larger safety gains from longer timelines. In other words, the expected gains from slow, continuous takeoff need to outweigh the expected costs of absolutely shorter timelines, for short timelines and slow, continuous takeoff to be the safest path to AGI.
So an important additional part of this argument is that time during takeoff is more valuable than time before takeoff, for various important kinds of work:[5]
AI safety.
During a slow, continuous takeoff, we’d be able to do empirical work on very powerful systems, such that most of the useful safety work might end up being done in this period.[6]
Regulation.
Regulation is often responsive, and governments and international organisations need time to draft, approve and implement new measures. Time after the deployment of very powerful systems, but before the deployment of AGI, is most useful from this perspective.
Human adaptation to AI capabilities.
The resilience of the human population might depend in part on humans adapting to AI systems, learning to make positive use of them, and learning to combat negative effects of AI. You can think of this as a kind of coevolution, or as the number of shots humanity gets at deploying AI safely.
If you buy this argument, then it can be worth trading time now for time later (i.e. reducing the absolute number of months to AGI in exchange for more months during takeoff).
The argument has a few possible implications:
Developing AI more slowly might actually be more dangerous than developing it quickly, if slowing creates compute (or other kinds of) overhang.
Compute and other kinds of overhang are really dangerous, as they increase the chances of discontinuities.
As the leading lab in an AI race, it’s plausible that it might be better to use up all of the available overhang at the expense of shorter timelines, in order to get more continuous progress and reduce expected takeoff speed.
This is especially likely to be true if you think that using up overhang will only moderately speed up timelines, or that takeoff can be lengthened significantly.
The idea that moving faster now will reduce speed later is a bit counterintuitive. Here’s a drawing illustrating the idea:
Deploying powerful AI systems might be safer than not deploying them (depending on how confident you are that a given system is safe, and to what degree).
Not deploying could just mean that someone else deploys similar things at some time lag.[7] This increases discontinuity in (deployed) AI capabilities, which increases risk (because of the following two bullet points).
Deployment might give us important empirical alignment information.
Deployment might further regulation or human adaptation.
It’s possible to safely navigate short timelines and slow, continuous takeoff
It might be the case that short timelines and slow, continuous takeoff are the safest path to AGI, but still incredibly dangerous, such that there aren’t viable theories of victory even assuming this path.
This is the part of the argument that we understand least well, as it mostly boils down to how hard alignment is, and neither of us have technical expertise. We’re going to try to cite what we think people’s arguments here are, but our understanding is shallow and we can’t properly unpack the claims. We’d love to see better versions of this claim in the comments.
We think that the main arguments that there are indeed safe paths to AGI assuming short timelines and slow, continuous takeoff are:
Leading actors can coordinate once it’s clear that systems are dangerous (e.g. an eval reveals dangerous capabilities) and slow down before any truly dangerous systems are released. The time bought by this slowdown is enough for us to either coordinate on a longer slowdown or to solve alignment well enough to deploy AGI.
The empirical evidence on alignment so far is positive. See Jan Leike here.
As long as progress is continuous, then we should be able to bootstrap our way to alignment using AI alignment assistants. The case here is something like:
Solving alignment is much harder than building AIs to help us solve alignment.
Evaluating alignment research is much easier than doing it.
Alignment research will only require narrow AI.
As long as progress is continuous, we don’t need to worry too much about alignment assistants themselves being dangerous. LLMs can already help with alignment research; if you think they don’t pose an existential threat, then they should also be able to help us build something better than themselves at alignment, which also doesn’t pose an existential threat, and so on.
In any given case, you’re using older models to align newer ones. So developing AI alignment assistants doesn’t push forward the development frontier much.
LLMs might be unusually easy to align, relative to other systems.[8]
Possible cruxes
There are many possible disagreements with the case we’ve tried to make above. Cruxes which we expect to be most important here are:
How hard alignment is.
How feasible it is to slow AI development without increasing compute overhang.
How likely it is that there are unavoidable discontinuities in AI capabilities and/or compute supply.
How cautious you expect current labs to be relative to future labs.
Whether you expect compute overhang to increase or decrease over time.
How feasible you think coordination is and what is needed in order to make meaningful coordination happen (e.g. will a capability demonstration revealed by a safety evaluation tip the scales toward meaningful coordination being feasible).
“AGI could happen soon or far in the future; the takeoff speed from the initial AGI to more powerful successor systems could be slow or fast. Many of us think the safest quadrant in this two-by-two matrix [of short/long timelines & slow/fast takeoff] is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt.” Sam Altman, Planning for AGI and beyond.
On the need for coordination: “we need some degree of coordination among the leading development efforts to ensure that the development of superintelligence occurs in a manner that allows us to both maintain safety and help smooth integration of these systems with society. There are many ways this could be implemented; major governments around the world could set up a project that many current efforts become part of, or we could collectively agree (with the backing power of a new organization like the one suggested below) that the rate of growth in AI capability at the frontier is limited to a certain rate per year.” Altman, Brockman and Sutskever, Governance of superintelligence.
See e.g. Paul Christiano’s response to “RLHF (and other forms of short-term “alignment” progress) make AI systems more useful and profitable, hastening progress towards dangerous capabilities” here. Also a paraphrase of this position in Eli Tyre’s post here.
“A few years ago it looked like the path to AGI was by training deep RL agents from scratch in a wide range of games and multi-agent environments. These agents would be aligned to maximizing simple score functions such as survival and winning games and wouldn’t know much about human values. Aligning the resulting agents would be a lot of effort: not only do we have to create a human-aligned objective function from scratch, we’d likely also need to instill actually new capabilities into the agents like understanding human society, what humans care about, and how humans think.
Large language models (LLMs) make this a lot easier: they come preloaded with a lot of humanity’s knowledge, including detailed knowledge about human preferences and values. Out of the box they aren’t agents who are trying to pursue their own goals in the world and and their objective functions are quite malleable.” Jan Leike here.
Short timelines and slow, continuous takeoff as the safest path to AGI
There are various different timelines/takeoff dynamics scenarios:
In this post, we’re going to set out our understanding of the case for short timelines and slow, continuous takeoff as the safest path to AGI.
A few reasons why laying out this case seems worth doing:
OpenAI’s strategy assumes that short timelines and slow, continuous takeoff are the safest path to AGI.[1] Whether or not this is the safest path, the fact that OpenAI thinks it’s true and is one of the leading AI labs makes it a path we’re likely to take. Humanity successfully navigating the transition to extremely powerful AI might therefore require successfully navigating a scenario with short timelines and slow, continuous takeoff.
Short timelines and slow, continuous takeoff might indeed be the safest path to AGI. After spending some time trying to state the strongest version of the case, we think it’s stronger than we initially expected (and than many in the AI safety community seem to assume).
If it’s not true that short timelines and slow, continuous takeoff are the safest path to AGI, it might be really important to convince people at OpenAI that this is the case. It seems unlikely that it’s possible to do this without deeply understanding their current view.
The arguments we’re going to lay out are:
Slow, continuous takeoff is safer than fast, discontinuous takeoff
Slow, continuous takeoff is more likely given short timelines than long ones
Time during takeoff is more valuable than time before, so it’s worth trading time now for time later
It’s possible to safely navigate short timelines and slow, continuous takeoff
Note that Sam Altman usually refers just to slow takeoff rather than slow continuous takeoff, but in this piece we discuss slow continuous takeoff, because we think that it’s the strongest way of making this case and that it’s likely that this is what Altman means.
Disclaimers:
Neither of us work at OpenAI or are in touch with OpenAI leadership. This post represents our best guess charitable interpretation of the case. We’ve done a lot of filling in the blanks on our side in order to make the arguments as coherent as possible. Our ideal in writing this post is to pass an ideological Turing Test, where Sam Altman could read this post and think ‘yes, exactly, that’s why I think that’. Probably we fall short of this as we haven’t engaged with Sam directly.
Neither of us fully buys that short timelines and slow, continuous takeoff are the safest path to AGI. We’re writing this post to improve our (and hopefully other people’s) understanding of the position. In this post, we don’t give our overall evaluation of the case—just present what seem to us like the strongest arguments for it.
We hope this post will serve as a jumping off point for critiques which engage more deeply with OpenAI’s strategy. (And where the post doesn’t faithfully capture OpenAI views, we’d love to get comments pointing this out.)
Slow, continuous takeoff is safer than fast, discontinuous takeoff
To successfully navigate the transition to extremely powerful AI, we want AI safety and governance efforts to keep pace with AI capabilities (or ideally, to exceed them). When compared to fast discontinuous takeoff, slow continuous takeoff seems much safer from this perspective:
Fast takeoff doesn’t give society time to figure out how to respond to advanced capabilities and the destabilisation which could follow from them. It also doesn’t give us to coordinate once the danger is clearer and more imminent.
Discontinuous takeoff entails jumps where capabilities could suddenly exceed existing levels of safety/governance.
Figuring out how to make systems safe will likely depend on bootstrapping up safety using the last generation (e.g. using GPT-5 to align GPT-6). Even if takeoff is (relatively) fast, this makes continuity preferable to discontinuity, as continuity makes it more likely that bootstrapping alignment works.
While slow, continuous takeoff likely means more actors are involved and greater proliferation of powerful models (in comparison to one lab suddenly jumping far ahead of any others), this is partially mitigated if timelines are short. Short timelines make it more likely that leading actors remain concentrated in the West and among the current crop of top companies. These companies may be more likely to be able to coordinate with one another on safety than the group of companies/state actors who would replace them should the current leaders slow down.
Slow, continuous takeoff is more likely given short timelines than long ones
The two slow, continuous takeoff scenarios in Figure 1 above were:
The long timelines scenario looks safer: takeoff is just as continuous, we get just as much time during takeoff, and we also get more time before takeoff. It seems to strictly dominate the short timelines version.
In the abstract, we think this is true.
The argument for short timelines and slow, continuous takeoff as the safest path to AGI is that slow, continuous takeoff is more likely given short timelines—rather than the argument being that short timelines would be preferable to long timelines if you could actually get slow, continuous takeoff in both scenarios.
The strongest arguments that slow, continuous takeoff is more likely given short timelines are:
Coordination seems likely to be particularly easy now relative to later. Coordination allows us to move more slowly through takeoff, all else equal.[2]
Right now there are only a handful of frontier labs. Over time, the number of labs might increase, for instance if:
Commercial incentives get stronger.
Barriers to entry lower.
More labs catch up.
More labs splinter from existing labs.
Relatedly, currently the frontier labs are all Western. Coordination between frontier labs in China and the West seems likely to be harder than coordination between Western labs, for cultural and geopolitical/natsec reasons.
Most of the frontier labs recognise that AI risk is a thing and explicitly support AI safety work. It seems at least plausible that current labs are unusually cautious compared to labs we might see in future.[3]
AI development is compute intensive so it’s obvious who is doing it. Over sufficiently long timeframes, this might no longer hold.
It’s plausible that compute overhang is low now relative to later, and this tends towards slower, more continuous takeoff.[4]
There might be discontinuous increases in compute supply in future, via:
Advances in computing technologies.
Discontinuous increases in the capital available for compute.
Future regulation might create a compute overhang. E.g. a moratorium on large training runs directly translates to a larger compute overhang.
Time during takeoff is more valuable than time before, so it’s worth trading time now for time later
Both of the points we’ve made so far (that slow, continuous takeoff is safer; and that it’s more likely given short timelines) could be true without making short timelines and slow, continuous takeoff the safest path to AGI.
It could be the case that the safety gains from slow, continuous takeoff are dominated by larger safety gains from longer timelines. In other words, the expected gains from slow, continuous takeoff need to outweigh the expected costs of absolutely shorter timelines, for short timelines and slow, continuous takeoff to be the safest path to AGI.
So an important additional part of this argument is that time during takeoff is more valuable than time before takeoff, for various important kinds of work:[5]
AI safety.
During a slow, continuous takeoff, we’d be able to do empirical work on very powerful systems, such that most of the useful safety work might end up being done in this period.[6]
Regulation.
Regulation is often responsive, and governments and international organisations need time to draft, approve and implement new measures. Time after the deployment of very powerful systems, but before the deployment of AGI, is most useful from this perspective.
Human adaptation to AI capabilities.
The resilience of the human population might depend in part on humans adapting to AI systems, learning to make positive use of them, and learning to combat negative effects of AI. You can think of this as a kind of coevolution, or as the number of shots humanity gets at deploying AI safely.
If you buy this argument, then it can be worth trading time now for time later (i.e. reducing the absolute number of months to AGI in exchange for more months during takeoff).
The argument has a few possible implications:
Developing AI more slowly might actually be more dangerous than developing it quickly, if slowing creates compute (or other kinds of) overhang.
Compute and other kinds of overhang are really dangerous, as they increase the chances of discontinuities.
As the leading lab in an AI race, it’s plausible that it might be better to use up all of the available overhang at the expense of shorter timelines, in order to get more continuous progress and reduce expected takeoff speed.
This is especially likely to be true if you think that using up overhang will only moderately speed up timelines, or that takeoff can be lengthened significantly.
The idea that moving faster now will reduce speed later is a bit counterintuitive. Here’s a drawing illustrating the idea:
Deploying powerful AI systems might be safer than not deploying them (depending on how confident you are that a given system is safe, and to what degree).
Not deploying could just mean that someone else deploys similar things at some time lag.[7] This increases discontinuity in (deployed) AI capabilities, which increases risk (because of the following two bullet points).
Deployment might give us important empirical alignment information.
Deployment might further regulation or human adaptation.
It’s possible to safely navigate short timelines and slow, continuous takeoff
It might be the case that short timelines and slow, continuous takeoff are the safest path to AGI, but still incredibly dangerous, such that there aren’t viable theories of victory even assuming this path.
This is the part of the argument that we understand least well, as it mostly boils down to how hard alignment is, and neither of us have technical expertise. We’re going to try to cite what we think people’s arguments here are, but our understanding is shallow and we can’t properly unpack the claims. We’d love to see better versions of this claim in the comments.
We think that the main arguments that there are indeed safe paths to AGI assuming short timelines and slow, continuous takeoff are:
Leading actors can coordinate once it’s clear that systems are dangerous (e.g. an eval reveals dangerous capabilities) and slow down before any truly dangerous systems are released. The time bought by this slowdown is enough for us to either coordinate on a longer slowdown or to solve alignment well enough to deploy AGI.
The empirical evidence on alignment so far is positive. See Jan Leike here.
As long as progress is continuous, then we should be able to bootstrap our way to alignment using AI alignment assistants. The case here is something like:
Solving alignment is much harder than building AIs to help us solve alignment.
Evaluating alignment research is much easier than doing it.
Alignment research will only require narrow AI.
As long as progress is continuous, we don’t need to worry too much about alignment assistants themselves being dangerous. LLMs can already help with alignment research; if you think they don’t pose an existential threat, then they should also be able to help us build something better than themselves at alignment, which also doesn’t pose an existential threat, and so on.
In any given case, you’re using older models to align newer ones. So developing AI alignment assistants doesn’t push forward the development frontier much.
See ‘Training AI systems to do alignment research’ in Our approach to alignment research, and also Jan Leike’s post here.
LLMs might be unusually easy to align, relative to other systems.[8]
Possible cruxes
There are many possible disagreements with the case we’ve tried to make above. Cruxes which we expect to be most important here are:
How hard alignment is.
How feasible it is to slow AI development without increasing compute overhang.
How likely it is that there are unavoidable discontinuities in AI capabilities and/or compute supply.
How cautious you expect current labs to be relative to future labs.
Whether you expect compute overhang to increase or decrease over time.
How feasible you think coordination is and what is needed in order to make meaningful coordination happen (e.g. will a capability demonstration revealed by a safety evaluation tip the scales toward meaningful coordination being feasible).
“AGI could happen soon or far in the future; the takeoff speed from the initial AGI to more powerful successor systems could be slow or fast. Many of us think the safest quadrant in this two-by-two matrix [of short/long timelines & slow/fast takeoff] is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination and more likely to lead to a slower takeoff due to less of a compute overhang, and a slower takeoff gives us more time to figure out empirically how to solve the safety problem and how to adapt.” Sam Altman, Planning for AGI and beyond.
On the need for coordination: “we need some degree of coordination among the leading development efforts to ensure that the development of superintelligence occurs in a manner that allows us to both maintain safety and help smooth integration of these systems with society. There are many ways this could be implemented; major governments around the world could set up a project that many current efforts become part of, or we could collectively agree (with the backing power of a new organization like the one suggested below) that the rate of growth in AI capability at the frontier is limited to a certain rate per year.” Altman, Brockman and Sutskever, Governance of superintelligence.
Though this might not be the case, if awareness and understanding of AI risk and AI safety increases dramatically over time.
Though this might not be the case if race dynamics intensify.
See e.g. Zach Stein-Perlman here.
See e.g. Paul Christiano’s response to “RLHF (and other forms of short-term “alignment” progress) make AI systems more useful and profitable, hastening progress towards dangerous capabilities” here. Also a paraphrase of this position in Eli Tyre’s post here.
Either because of information leakage, or because of independent progress.
“A few years ago it looked like the path to AGI was by training deep RL agents from scratch in a wide range of games and multi-agent environments. These agents would be aligned to maximizing simple score functions such as survival and winning games and wouldn’t know much about human values. Aligning the resulting agents would be a lot of effort: not only do we have to create a human-aligned objective function from scratch, we’d likely also need to instill actually new capabilities into the agents like understanding human society, what humans care about, and how humans think.
Large language models (LLMs) make this a lot easier: they come preloaded with a lot of humanity’s knowledge, including detailed knowledge about human preferences and values. Out of the box they aren’t agents who are trying to pursue their own goals in the world and and their objective functions are quite malleable.” Jan Leike here.