I think there are ways to reduce misuse risk, but they’re not specific to goal agnostic systems so they’re a bit out of scope but… it’s still not a great situation. It’s about 75-80% of my p(doom) at the moment (on a p(doom) of ~30%).
It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
I’m optimistic about avoiding this specific pit. It does indeed look like something strong would be required, but I don’t think ‘narrow target for a maximizing agent’ is usefully strong. In other words, I think we’ll get enough strength out of something that’s close enough to the intuitive version of corrigible, and we’ll reach that before we have tons of strong optimizers of the (automatically) doombringing kind laying around.
Thanks!
I think there are ways to reduce misuse risk, but they’re not specific to goal agnostic systems so they’re a bit out of scope but… it’s still not a great situation. It’s about 75-80% of my p(doom) at the moment (on a p(doom) of ~30%).
I’m optimistic about avoiding this specific pit. It does indeed look like something strong would be required, but I don’t think ‘narrow target for a maximizing agent’ is usefully strong. In other words, I think we’ll get enough strength out of something that’s close enough to the intuitive version of corrigible, and we’ll reach that before we have tons of strong optimizers of the (automatically) doombringing kind laying around.