I could be wrong, but I’d guess convincing the ten most relevant leaders of AI labs that this is a massive deal, worth prioritizing, actually gets you a decent slow-down. I don’t have much evidence for this.
…
Delay is probably finite by default
Convincing OpenAI, Anthropic, and Google seems moderately hard—they’ve all clearly already considered the matter carefully, and it’s fairly apparent from their actions what conclusions they’ve reached about pausing now. Then convincing Meta, Mistral, Cohere, and the next dozen-or-so would-be OpenAI replacements seems harder. Then, also doing so for the Chinese and UAE equivalents and their governments seems like it might be harder still, not so much because talking to their AI researchers is hard, but because you also have to persuade members of their governments, which are top-down power structures headed by people used to getting their way. Then you need to convince A19z, EleutherAI, Hugging Face and the entire open-source AI community — that seems harder still, mostly because there are so many of them. After that we’re looking at people like the Russians and the North Koreans, then the academic AI research community, and after that probably things like criminal organizations and warlords.
Somewhere on that sequence, you’re going to find a nut you can’t crack. Having multiple major governments on your side would help a lot, they might well be able to achieve everything before North Korea and the Russians. At current speed of improvements in Moore’s Law for GPUs and algorithmic improvements and monetary investments in AI, each step on that ladder probably buys you something like 6 months of delay (this is a rough estimate, I’m sure there are people in the AI governance community who could give better ones.) If you got several steps down, you might even be able to somewhat slow the growth of training compute and algorithmic improvements, so that might stretch to more like 12 months per rung. So with massive effort backed by major governments, we might be able to slow things down by up to about 4 years. If we’re all going to die at the end, that’s ~4 years of extra life for ~8 billion people, clearly a huge positive achievement. If those ~4 years would make a large improvement to Alignment fieldbuilding and progress, that could make an even bigger difference. But the downside is that if someone builds AGI in secret during those ~4 years, it’s going to be a criminal organization or a failed state, and that near the end of it, you’re now in a race with people like the North Koreans. We’d also have built up some overhang: on my progress-slowing assumption, only about 2 years’ worth at current rates, since the world would actually be behind the unslowed track, but still, some overhang. Overhang presumably leads to a potential for rapid growth, which is extremely bad for safety.
The obvious next question is, how far away is AGI? My personal current bests guess, off rough parallels to human synapse counts, is about 2-3 orders of magnitude in parameter counts (which is ~2-3 GPT generations, so ~GPT-6 or GPT-7, which sounds about right based on generational progress so far). That’s ~4-6 orders of magnitude in total training compute cost before any algorithmic and Moore’s Law discounts. At the GPT-1 through 4 progress rate, that’s around 2.5-4 years. Clearly it’s hard to be sure, this could be only one GPT-generation away (about a year), or four (about 5.5 years). So an additional ~4 year delay is quite a significant amount of extra time, probably more than doubling, but not a huge difference.
The next question is, if we’re going to delay, when? Coordinating a delay is probably going to be easiest when the danger is near and most tangible and demonstrable, and as the OP notes the gains for an extra up-to-4 years of Alignment research will be biggest with near-AGI systems to work with and more Alignment fieldbuilding done. There will obviously also be a lot more political capital for a pause once the significant white-collar job losses start. So there would seem to be reasons to maintain a significant lead until you pause, and then pause for as long as you can get away with (while closely monitoring anyone still racing).
The recent blitz to get governments involved and set up regulatory frameworks could be interpreted as laying the groundwork for a future government-led pause (and for starting to slow smaller players) without actually initiating one yet. So it looks like the intention is to build models around the GPT-5 level, do some safety analysis, and then see what they look like. At that point, if they were showing really scary capabilities (“High” across the board on OpenAI’s Preparednesss framework, for example), then that would put the labs in a good position to go back to the governments and yell “Crisis! Pull the emergency brake!” They would also probably by then have a much better idea if they already have AGI on their hands, or it’s still 1-3 generations away.
Convincing OpenAI, Anthropic, and Google seems moderately hard—they’ve all clearly already considered the matter carefully, and it’s fairly apparent from their actions what conclusions they’ve reached about pausing now. Then convincing Meta, Mistral, Cohere, and the next dozen-or-so would-be OpenAI replacements seems harder. Then, also doing so for the Chinese and UAE equivalents and their governments seems like it might be harder still, not so much because talking to their AI researchers is hard, but because you also have to persuade members of their governments, which are top-down power structures headed by people used to getting their way. Then you need to convince A19z, EleutherAI, Hugging Face and the entire open-source AI community — that seems harder still, mostly because there are so many of them. After that we’re looking at people like the Russians and the North Koreans, then the academic AI research community, and after that probably things like criminal organizations and warlords.
Somewhere on that sequence, you’re going to find a nut you can’t crack. Having multiple major governments on your side would help a lot, they might well be able to achieve everything before North Korea and the Russians. At current speed of improvements in Moore’s Law for GPUs and algorithmic improvements and monetary investments in AI, each step on that ladder probably buys you something like 6 months of delay (this is a rough estimate, I’m sure there are people in the AI governance community who could give better ones.) If you got several steps down, you might even be able to somewhat slow the growth of training compute and algorithmic improvements, so that might stretch to more like 12 months per rung. So with massive effort backed by major governments, we might be able to slow things down by up to about 4 years. If we’re all going to die at the end, that’s ~4 years of extra life for ~8 billion people, clearly a huge positive achievement. If those ~4 years would make a large improvement to Alignment fieldbuilding and progress, that could make an even bigger difference. But the downside is that if someone builds AGI in secret during those ~4 years, it’s going to be a criminal organization or a failed state, and that near the end of it, you’re now in a race with people like the North Koreans. We’d also have built up some overhang: on my progress-slowing assumption, only about 2 years’ worth at current rates, since the world would actually be behind the unslowed track, but still, some overhang. Overhang presumably leads to a potential for rapid growth, which is extremely bad for safety.
The obvious next question is, how far away is AGI? My personal current bests guess, off rough parallels to human synapse counts, is about 2-3 orders of magnitude in parameter counts (which is ~2-3 GPT generations, so ~GPT-6 or GPT-7, which sounds about right based on generational progress so far). That’s ~4-6 orders of magnitude in total training compute cost before any algorithmic and Moore’s Law discounts. At the GPT-1 through 4 progress rate, that’s around 2.5-4 years. Clearly it’s hard to be sure, this could be only one GPT-generation away (about a year), or four (about 5.5 years). So an additional ~4 year delay is quite a significant amount of extra time, probably more than doubling, but not a huge difference.
The next question is, if we’re going to delay, when? Coordinating a delay is probably going to be easiest when the danger is near and most tangible and demonstrable, and as the OP notes the gains for an extra up-to-4 years of Alignment research will be biggest with near-AGI systems to work with and more Alignment fieldbuilding done. There will obviously also be a lot more political capital for a pause once the significant white-collar job losses start. So there would seem to be reasons to maintain a significant lead until you pause, and then pause for as long as you can get away with (while closely monitoring anyone still racing).
The recent blitz to get governments involved and set up regulatory frameworks could be interpreted as laying the groundwork for a future government-led pause (and for starting to slow smaller players) without actually initiating one yet. So it looks like the intention is to build models around the GPT-5 level, do some safety analysis, and then see what they look like. At that point, if they were showing really scary capabilities (“High” across the board on OpenAI’s Preparednesss framework, for example), then that would put the labs in a good position to go back to the governments and yell “Crisis! Pull the emergency brake!” They would also probably by then have a much better idea if they already have AGI on their hands, or it’s still 1-3 generations away.