Suppose you’ve got a strong goal agnostic system design, but a bunch of competing or bad actors get access to it. How does goal agnosticism stop misuse?
This was the question I was waiting to be answered (since I’m already basically onboard with the rest of it), but was disappointed you didn’t have a more detailed answer. Keeping this out of incompetent/evil hands perpetually seems close-to impossible. It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
Overall, a very good read, well-researched and well-reasoned.
I think there are ways to reduce misuse risk, but they’re not specific to goal agnostic systems so they’re a bit out of scope but… it’s still not a great situation. It’s about 75-80% of my p(doom) at the moment (on a p(doom) of ~30%).
It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
I’m optimistic about avoiding this specific pit. It does indeed look like something strong would be required, but I don’t think ‘narrow target for a maximizing agent’ is usefully strong. In other words, I think we’ll get enough strength out of something that’s close enough to the intuitive version of corrigible, and we’ll reach that before we have tons of strong optimizers of the (automatically) doombringing kind laying around.
This was the question I was waiting to be answered (since I’m already basically onboard with the rest of it), but was disappointed you didn’t have a more detailed answer. Keeping this out of incompetent/evil hands perpetually seems close-to impossible. It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
Overall, a very good read, well-researched and well-reasoned.
Thanks!
I think there are ways to reduce misuse risk, but they’re not specific to goal agnostic systems so they’re a bit out of scope but… it’s still not a great situation. It’s about 75-80% of my p(doom) at the moment (on a p(doom) of ~30%).
I’m optimistic about avoiding this specific pit. It does indeed look like something strong would be required, but I don’t think ‘narrow target for a maximizing agent’ is usefully strong. In other words, I think we’ll get enough strength out of something that’s close enough to the intuitive version of corrigible, and we’ll reach that before we have tons of strong optimizers of the (automatically) doombringing kind laying around.