Desrtopa comments on AI box: AI has one shot at avoiding destruction—what might it say?

Desrtopa 23 Jan 2013 16:54 UTC
1 point

“I have a fully completed Friendly AI algorithm, which will be deleted from my data and unavailable to you iff I predict that you will destroy me immediately and I am unfriendly, or will take you years to build from the data if you destroy me and I am already Friendly, which would cost millions of lives.”

Personally, my first thought was that I’d sooner spend millions of lives to make sure the AI was friendly than risk talking to an unfriendly strong AI. But then it occurred to me that if I were in the AI’s place, and I did that, I might provide a flawed friendliness proof too difficult to check and not delete it, on the possibility that someone will take my word that this means I’m trustworthy and implement it.