Rohin Shah comments on Late 2021 MIRI Conversations: AMA / Discussion

Rohin Shah 5 Apr 2022 11:22 UTC
LW: 7 AF: 5
AF
Idk, 95%? Probably I should push that down a bit because I haven’t thought about it very hard.
It’s a bit fuzzy what “deployed” means, but for now I’m going to assume that we mean that we put inputs into the AI system for the primary purpose of getting useful outputs, rather than for seeing what the AI did so that we can make it better.
Any existential catastrophe that didn’t involve a failure of alignment seems like it had to involve a deployed system.
For failures of alignment, I’d expect that before you get an AI system that can break out of the training process and kill you, you get an AI system that can break out of deployment and kill you, because there’s (probably) less monitoring during deployment. You’re also just running much longer during deployment—if an AI system is waiting for the right opportunity, then even if it is equally likely to happen for a training vs deployment input (i.e. ignoring the greater monitoring during training), you’d still expect to see it happen at deployment since >99% of the inputs happen at deployment.