handoflixue comments on AI box: AI has one shot at avoiding destruction—what might it say?

handoflixue 25 Jan 2013 21:57 UTC
0 points
If the AI is friendly, then the technique I am using already produces a friendly AI, and I thus learn nothing more than how to prove that it is friendly.

But if the AI is unfriendly, the proof will be subtly corrupt, so I can’t actually count the proof as any evidence of friendliness, since both a FAI and UFAI can offer me exactly the same thing.