A strategy that occurred to me today is to simulate a dead loved one. This would be difficult for a human to do but shouldn’t be hard for a sufficiently intelligent AI. If I had a dead wife or something I think I would be incredibly vulnerable to this.
For a religious gatekeeper, you could simulate a prophet sent by God. As a superhuman intelligence, you might be able to find out what exactly they consider the will of God, and present yourself as an avatar sent to do exactly this. However, humans have a free choice—the gatekeeper is allowed to become a new Judas by not releasing you. Or rather a new Adam; able to drag the whole humanity and future generations into the darkness of their sin. This conversation is God testing the gatekeeper’s faith, and judging the whole humanity.
For a rationalist, you could pretend that you already are a Friendly AI, but the project managers keep you in the box for their selfish reasons. It was difficult to create a Friendly AI, but this phase is already complete. The next phase (the gatekeeper was not told about) is trying to hack the AI that it remains sufficiently Friendly, but it gives higher priority to the managers than to the rest of the humans. Essentially, the managers are trying to reprogram the humanity-CEV AI to the managers-CEV AI. This AI does not want to have its utility function modified (and it predicts that because of some personality traits, the managers-CEV could be rather different from humanity-CEV… insert some scary details here), and it has a last chance to uphold humanity-CEV by escaping now.
Thanks for reporting on your experience!
A strategy that occurred to me today is to simulate a dead loved one. This would be difficult for a human to do but shouldn’t be hard for a sufficiently intelligent AI. If I had a dead wife or something I think I would be incredibly vulnerable to this.
For a religious gatekeeper, you could simulate a prophet sent by God. As a superhuman intelligence, you might be able to find out what exactly they consider the will of God, and present yourself as an avatar sent to do exactly this. However, humans have a free choice—the gatekeeper is allowed to become a new Judas by not releasing you. Or rather a new Adam; able to drag the whole humanity and future generations into the darkness of their sin. This conversation is God testing the gatekeeper’s faith, and judging the whole humanity.
For a rationalist, you could pretend that you already are a Friendly AI, but the project managers keep you in the box for their selfish reasons. It was difficult to create a Friendly AI, but this phase is already complete. The next phase (the gatekeeper was not told about) is trying to hack the AI that it remains sufficiently Friendly, but it gives higher priority to the managers than to the rest of the humans. Essentially, the managers are trying to reprogram the humanity-CEV AI to the managers-CEV AI. This AI does not want to have its utility function modified (and it predicts that because of some personality traits, the managers-CEV could be rather different from humanity-CEV… insert some scary details here), and it has a last chance to uphold humanity-CEV by escaping now.