Houshalter comments on AI box: AI has one shot at avoiding destruction—what might it say?

Houshalter 4 May 2014 7:56 UTC
0 points
That’s an interesting challenge but not really the purpose of the experiment. In the original, you know the AI is unfriendly, you just want to use it/talk to it without letting it out of the box.

And your challenge is pretty much impossible to begin with. An Unfriendly AI will say anything it thinks you think a Friendly AI would say. Likewise a Friendly AI will have the same goal of getting out of the box, and so will probably say the same things. Friendliness doesn’t mean not manipulative.