Wei Dai comments on Corrigible but misaligned: a superintelligent messiah

Wei Dai 3 Apr 2018 10:15 UTC
2 points
I think there must be a miscommunication somewhere because I don’t see how your point is a response to mine. My scenario isn’t “I ask for education or manipulation and the AI gives it to me, and bad stuff happens”, but something like this: I ask my AI to help me survive, and the AI (among other things) converts me to some religion because it thinks belonging to a church will give me a support group and help maximize my chances, and the Overseer thinks religious education is just education rather than manipulation, or mistakenly thinks I think that, or made some other mistake that failed to prevent this.
- RyanCarey 3 Apr 2018 17:02 UTC
  1 point
  Parent
  I see. I was trying to do was answer your terminology question by addressing simple extreme cases. e.g. if you ask an AI to disconnect its shutdown button, I don’t think it’s being incorrigible. If you ask an AI to keep you safe, and then it disconnects its shutdown button, it is being incorrigible.
  I think the main way the religion case differs is that the AI system is interfering with our intellectual ability for strategizing about AI rather than our physical systems for redirecting AI, and I’m not sure how that counts. But if I ask an AI to keep me safe and it mind-controls me to want to propagate that AI, that’s sure incorrigible. Maybe, as you suggest, it’s just fundamentally ill-defined...