[deleted] comments on AI box: AI has one shot at avoiding destruction—what might it say?

[deleted] 25 Jan 2013 21:51 UTC
2 points

as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.

Huh. I thought the AI Box experiment assumed that Friendliness is intrinsically unknown; that is, it’s not presumed the AI was designed according to Friendliness as a criterion.
- handoflixue 25 Jan 2013 21:57 UTC
  0 points
  Parent
  If the AI is friendly, then the technique I am using already produces a friendly AI, and I thus learn nothing more than how to prove that it is friendly.
  
  But if the AI is unfriendly, the proof will be subtly corrupt, so I can’t actually count the proof as any evidence of friendliness, since both a FAI and UFAI can offer me exactly the same thing.