About 1 year ago, I wrote up a ready-to-go plan for AI safety focused on current science (what we roughly know how to do right now). This is targeting reducing catastrophic risks from the point when we have transformatively powerful AIs (e.g. AIs similarly capable to humans).
I never finished this doc, and it is now considerably out of date relative to how I currently think about what should happen, but I still think it might be helpful to share.
Here is the doc. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I plan on trying to think though the best ready-to-go plan roughly once a year. Buck and I have recently started work on a similar effort. Maybe this time we’ll actually put out an overall plan rather than just spinning off various docs.
This seems like a great activity, thank you for doing/sharing it. I disagree with the claim near the end that this seems better than Stop, and in general felt somewhat alarmed throughout at (what seemed to me like) some conflation/conceptual slippage between arguments that various strategies were tractable, and that they were meaningfully helpful. Even so, I feel happy that the world contains people sharing things like this; props.
I disagree with the claim near the end that this seems better than Stop
At the start of the doc, I say:
It’s plausible that the optimal approach for the AI lab is to delay training the model and wait for additional safety progress. However, we’ll assume the situation is roughly: there is a large amount of institutional will to implement this plan, but we can only tolerate so much delay. In practice, it’s unclear if there will be sufficient institutional will to faithfully implement this proposal.
Towards the end of the doc I say:
This plan requires quite a bit of institutional will, but it seems good to at least know of a concrete achievable ask to fight for other than “shut everything down”. I think careful implementation of this sort of plan is probably better than “shut everything down” for most AI labs, though I might advocate for slower scaling and a bunch of other changes on current margins.
Presumably, you’re objecting to ‘I think careful implementation of this sort of plan is probably better than “shut everything down” for most AI labs’.
My current view is something like:
If there was broad, strong, and durable political will and buy in for heavily prioritizing AI takeover risk in the US, I think it would be good if the US government shut down scaling and took strong actions to prevent frontier AI progress while also accelerating huge amounts of plausibly-safety-related research.
You’d need to carefully manage the transition back to scaling to reduce hardware overhang issues. This is part of why I think “durable” political will is important. There are various routes for doing this with different costs.
I’m sympathetic to thinking this doesn’t make sense if you just care about deaths prior to age 120 of currently alive people and wide-spread cryonics is hopeless (even conditional on the level of political support for mitigating AI takeover risk). Some other views which just care about achieving close to normal lifespans for currently alive humans also maybe aren’t into pausing.
Regulations/actions which have the side effect of slowing down scaling which aren’t part of a broad package seem way less good. This is partially due to hardware/algorithmic overhang concerns, but more generally due to follow though concerns. I also wouldn’t advocate such regulations (regulations with the main impact being to slow down as a side effect) due to integrity/legitimacy concerns.
Unilateral shutdown is different as advice than “it would be good if everyone shut down” because AI companies might think (correctly or not) that they would be better on safety than other companies. In practice, no AI lab seems to have expressed a view which is close to “acceleration is bad” except Anthropic (see core views on AI safety).
We are very, very far from broad, strong, and durable political will for heavily prioritizing AI takeover, so weaker and conditional interventions for actors on the margin seem useful.
About 1 year ago, I wrote up a ready-to-go plan for AI safety focused on current science (what we roughly know how to do right now). This is targeting reducing catastrophic risks from the point when we have transformatively powerful AIs (e.g. AIs similarly capable to humans).
I never finished this doc, and it is now considerably out of date relative to how I currently think about what should happen, but I still think it might be helpful to share.
Here is the doc. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I plan on trying to think though the best ready-to-go plan roughly once a year. Buck and I have recently started work on a similar effort. Maybe this time we’ll actually put out an overall plan rather than just spinning off various docs.
This seems like a great activity, thank you for doing/sharing it. I disagree with the claim near the end that this seems better than Stop, and in general felt somewhat alarmed throughout at (what seemed to me like) some conflation/conceptual slippage between arguments that various strategies were tractable, and that they were meaningfully helpful. Even so, I feel happy that the world contains people sharing things like this; props.
At the start of the doc, I say:
Towards the end of the doc I say:
Presumably, you’re objecting to ‘I think careful implementation of this sort of plan is probably better than “shut everything down” for most AI labs’.
My current view is something like:
If there was broad, strong, and durable political will and buy in for heavily prioritizing AI takeover risk in the US, I think it would be good if the US government shut down scaling and took strong actions to prevent frontier AI progress while also accelerating huge amounts of plausibly-safety-related research.
You’d need to carefully manage the transition back to scaling to reduce hardware overhang issues. This is part of why I think “durable” political will is important. There are various routes for doing this with different costs.
I’m sympathetic to thinking this doesn’t make sense if you just care about deaths prior to age 120 of currently alive people and wide-spread cryonics is hopeless (even conditional on the level of political support for mitigating AI takeover risk). Some other views which just care about achieving close to normal lifespans for currently alive humans also maybe aren’t into pausing.
Regulations/actions which have the side effect of slowing down scaling which aren’t part of a broad package seem way less good. This is partially due to hardware/algorithmic overhang concerns, but more generally due to follow though concerns. I also wouldn’t advocate such regulations (regulations with the main impact being to slow down as a side effect) due to integrity/legitimacy concerns.
Unilateral shutdown is different as advice than “it would be good if everyone shut down” because AI companies might think (correctly or not) that they would be better on safety than other companies. In practice, no AI lab seems to have expressed a view which is close to “acceleration is bad” except Anthropic (see core views on AI safety).
We are very, very far from broad, strong, and durable political will for heavily prioritizing AI takeover, so weaker and conditional interventions for actors on the margin seem useful.