Prometheus comments on FAQ: What the heck is goal agnosticism?

Prometheus 8 Oct 2023 11:19 UTC
4 points
0
Suppose you’ve got a strong goal agnostic system design, but a bunch of competing or bad actors get access to it. How does goal agnosticism stop misuse?
This was the question I was waiting to be answered (since I’m already basically onboard with the rest of it), but was disappointed you didn’t have a more detailed answer. Keeping this out of incompetent/evil hands perpetually seems close-to impossible. It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
Overall, a very good read, well-researched and well-reasoned.
- porby 8 Oct 2023 19:11 UTC
  4 points
  0
  Parent
  Thanks!
  I think there are ways to reduce misuse risk, but they’re not specific to goal agnostic systems so they’re a bit out of scope but… it’s still not a great situation. It’s about 75-80% of my p(doom) at the moment (on a p(doom) of ~30%).
  It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we’re back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
  I’m optimistic about avoiding this specific pit. It does indeed look like something strong would be required, but I don’t think ‘narrow target for a maximizing agent’ is usefully strong. In other words, I think we’ll get enough strength out of something that’s close enough to the intuitive version of corrigible, and we’ll reach that before we have tons of strong optimizers of the (automatically) doombringing kind laying around.