About a month ago, I wrote a quick take suggesting that an early messaging mistake made by MIRI was: claim there should be a single leading FAI org, but not give specific criteria for selecting that org. That could’ve lead to a situation where Deepmind, OpenAI, and Anthropic can all think of themselves as “the best leading FAI org”.
An analogous possible mistake that’s currently being made: Claim that we should “shut it all down”, and also claim that it would be a tragedy if humanity never created AI, but not give specific criteria for when it would be appropriate to actually create AI.
What sort of specific criteria? One idea: A committee of random alignment researchers is formed to study the design; if at least X% of the committee rates the odds of success at Y% or higher, it gets the thumbs up. Not ideal criteria, just provided for the sake of illustration.
Why would this be valuable?
If we actually get a pause, it’s important to know when to unpause as well. Specific criteria could improve the odds that an unpause happens in a reasonable way.
If you want to build consensus for a pause, advertising some reasonable criteria for when we’ll unpause could get more people on board.
I also have the sense that the time to talk about unpausing is while creating the pause; this is why I generally am in favor of things like RSPs and RDPs. (I think others think that this is a bit premature / too easy to capture, and we are more likely to get a real pause by targeting a halt.)
About a month ago, I wrote a quick take suggesting that an early messaging mistake made by MIRI was: claim there should be a single leading FAI org, but not give specific criteria for selecting that org. That could’ve lead to a situation where Deepmind, OpenAI, and Anthropic can all think of themselves as “the best leading FAI org”.
An analogous possible mistake that’s currently being made: Claim that we should “shut it all down”, and also claim that it would be a tragedy if humanity never created AI, but not give specific criteria for when it would be appropriate to actually create AI.
What sort of specific criteria? One idea: A committee of random alignment researchers is formed to study the design; if at least X% of the committee rates the odds of success at Y% or higher, it gets the thumbs up. Not ideal criteria, just provided for the sake of illustration.
Why would this be valuable?
If we actually get a pause, it’s important to know when to unpause as well. Specific criteria could improve the odds that an unpause happens in a reasonable way.
If you want to build consensus for a pause, advertising some reasonable criteria for when we’ll unpause could get more people on board.
I think Six Dimensions of Operational Adequacy was in this direction; I wish we had been more willing to, like, issue scorecards earlier (like publishing that document in 2017 instead of 2022). The most recent scorecard-ish thing was commentary on the AI Safety Summit responses.
I also have the sense that the time to talk about unpausing is while creating the pause; this is why I generally am in favor of things like RSPs and RDPs. (I think others think that this is a bit premature / too easy to capture, and we are more likely to get a real pause by targeting a halt.)