Thanks for the thoughts. They’ve made me think that I’m likely underestimating how much Control is needed to get useful work out of AIs capable and inclined to scheme. Ideally, this fact would increase the likelihood of other actors implementing AI Control schemes with the stolen model that are at least sufficient for containment and/or make them less likely to steal the model, though I wouldn’t want to put too much weight on this hope.
>This argument isn’t control specific, it applies to any safety scheme with some operating tax or implementation difficulty.[1][2]
Yep, for sure. I’ve changed the title and commented about this at the end.
Ideally, this fact would increase the likelihood of other actors implementing AI Control schemes with the stolen model that are at least sufficient for containment and/or make them less likely to steal the model, though I wouldn’t want to put too much weight on this hope.
Agreed, it might not be clear to all actors that AI control would be in their interests for reducing risks of sabotage and other bad outcomes!
Thanks for the thoughts. They’ve made me think that I’m likely underestimating how much Control is needed to get useful work out of AIs capable and inclined to scheme. Ideally, this fact would increase the likelihood of other actors implementing AI Control schemes with the stolen model that are at least sufficient for containment and/or make them less likely to steal the model, though I wouldn’t want to put too much weight on this hope.
>This argument isn’t control specific, it applies to any safety scheme with some operating tax or implementation difficulty.[1][2]
Yep, for sure. I’ve changed the title and commented about this at the end.
Agreed, it might not be clear to all actors that AI control would be in their interests for reducing risks of sabotage and other bad outcomes!