Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it’s mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it’s a plausible and reasonable assumption to make. But I think this assumption ends up meaning that “the plan” excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here’s an alternative frame: I would call the plan described in Marius’s post something like the “short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes.”
You could imagine an alternative plan described as something like the “short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes.” This kind of plan would involve a very different focus.
Here are some examples of things that I think would be featured in a “government-focused” short timelines plan:
Demos of dangerous capabilities
Explanations of misalignment risks to senior policymakers. Identifying specific people who would be best-suited to provide those explanations, having those people practice giving explanations and addressing counterarguments, etc.
Plans for what the “trailing labs” should do if the leading lab appears to have an insurmountable lead (e.g., OpenAI develops a model that is automating AI R&D. It becomes clear that DeepMind and Anthropic are substantially behind OpenAI. At this point, do the labs merge and assist? Do they try to do a big, coordinated, costly push to get governments to take AI risks more seriously?)
Emergency preparedness– getting governments to be more likely to detect and appropriately respond to time-sensitive risks.
One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn’t see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).
(And just to be clear, this isn’t really a critique of Marius’s post. I think it’s great for people to be thinking about what the “plan” should be if the USG doesn’t react in time. Separately, I’d be excited for people to write more about what the short timelines “plan” should look like under different assumptions about USG involvement.)
I would love to see a post laying this out in more detail. I found writing my post a good exercise for prioritization. Maybe writing a similar piece where governance is the main lever brings out good insights into what to prioritize in governance efforts.
Akash, your comment raises the good point that a short-timelines plan that doesn’t realize governments are a really important lever here is missing a lot of opportunities for safety. Another piece of the puzzle that comes out when you consider what governance measures we’d want to include in the short timelines plan is the “off-ramps problem” that’s sort of touched on in this post.
Basically, our short timelines plan needs to also include measures (mostly governance/policy, though also technical) that get us to a desirable off-ramp from geopolitical tensions brought about by the economic and military transformation resulting from AGI/ASI.
I don’t think there are good off-ramps that do not route through governments. This is one reason to include more government-focused outreach/measures in our plans.
I think that if government involvement suddenly increases, there will also be a window of opportunity to get an AI safety treaty passed. I feel a government-focused plan should include pushing for this.
(I think heightened public xrisk awareness is also likely in such a scenario, making the treaty more achievable. I also think heightened awareness in both govt and public will make short treaty timelines (a year to weeks), at least between the US and China, realistic.)
Also, I think end games should be made explicit: what are we going to do once we have aligned ASI? I think that’s both true for Marius’ plan, and for a government-focused plan with a Manhattan or CERN included in it.
Big fan of this post. One thing worth highlighting IMO: The post assumes that governments will not react in time, so it’s mostly up to the labs (and researchers who can influence the labs) to figure out how to make this go well.
TBC, I think it’s a plausible and reasonable assumption to make. But I think this assumption ends up meaning that “the plan” excludes a lot of the work that could make the USG (a) more likely to get involved or (b) more likely to do good and useful things conditional on them deciding to get involved.
Here’s an alternative frame: I would call the plan described in Marius’s post something like the “short timelines plan assuming that governments do not get involved and assuming that technical tools (namely control/AI-automated AI R&D) are the only/main tools we can use to achieve good outcomes.”
You could imagine an alternative plan described as something like the “short timelines plan assuming that technical tools in the current AGI development race/paradigm are not sufficient and governance tools (namely getting the USG to provide considerably more oversight into AGI development, curb race dynamics, make major improvements to security) are the only/main tools we can use to achieve good outcomes.” This kind of plan would involve a very different focus.
Here are some examples of things that I think would be featured in a “government-focused” short timelines plan:
Demos of dangerous capabilities
Explanations of misalignment risks to senior policymakers. Identifying specific people who would be best-suited to provide those explanations, having those people practice giving explanations and addressing counterarguments, etc.
Plans for what the “trailing labs” should do if the leading lab appears to have an insurmountable lead (e.g., OpenAI develops a model that is automating AI R&D. It becomes clear that DeepMind and Anthropic are substantially behind OpenAI. At this point, do the labs merge and assist? Do they try to do a big, coordinated, costly push to get governments to take AI risks more seriously?)
Emergency preparedness– getting governments to be more likely to detect and appropriately respond to time-sensitive risks.
Preparing plans for what to do if governments become considerably more concerned about risks (e.g., preparing concrete Manhattan Project or CERN-for-AI style proposals, identifying and developing verification methods for domestic or international AI regulation.)
One possible counter is that under short timelines, the USG is super unlikely to get involved. Personally, I think we should have a lot of uncertainty RE how the USG will react. Examples of factors here: (a) new Administration, (b) uncertainty over whether AI will produce real-world incidents, (c) uncertainty over how compelling demos will be, (d) chatGPT being an illustrative example of a big increase in USG involvement that lots of folks didn’t see coming, and (e) examples of the USG suddenly becoming a lot more interested in a national security domain (e.g., 9/11--> Patriot Act, recent Tik Tok ban), (f) Trump being generally harder to predict than most Presidents (e.g., more likely to form opinions for himself, less likely to trust the establishment views in some cases).
(And just to be clear, this isn’t really a critique of Marius’s post. I think it’s great for people to be thinking about what the “plan” should be if the USG doesn’t react in time. Separately, I’d be excited for people to write more about what the short timelines “plan” should look like under different assumptions about USG involvement.)
I would love to see a post laying this out in more detail. I found writing my post a good exercise for prioritization. Maybe writing a similar piece where governance is the main lever brings out good insights into what to prioritize in governance efforts.
Akash, your comment raises the good point that a short-timelines plan that doesn’t realize governments are a really important lever here is missing a lot of opportunities for safety. Another piece of the puzzle that comes out when you consider what governance measures we’d want to include in the short timelines plan is the “off-ramps problem” that’s sort of touched on in this post.
Basically, our short timelines plan needs to also include measures (mostly governance/policy, though also technical) that get us to a desirable off-ramp from geopolitical tensions brought about by the economic and military transformation resulting from AGI/ASI.
I don’t think there are good off-ramps that do not route through governments. This is one reason to include more government-focused outreach/measures in our plans.
I think that if government involvement suddenly increases, there will also be a window of opportunity to get an AI safety treaty passed. I feel a government-focused plan should include pushing for this.
(I think heightened public xrisk awareness is also likely in such a scenario, making the treaty more achievable. I also think heightened awareness in both govt and public will make short treaty timelines (a year to weeks), at least between the US and China, realistic.)
Our treaty proposal (a few other good ones exist): https://time.com/7171432/conditional-ai-safety-treaty-trump/
Also, I think end games should be made explicit: what are we going to do once we have aligned ASI? I think that’s both true for Marius’ plan, and for a government-focused plan with a Manhattan or CERN included in it.