Bakkot comments on AI box: AI has one shot at avoiding destruction—what might it say?

Bakkot 23 Jan 2013 8:13 UTC
13 points
Even a friendly AI would view the world in which it’s out of the box as vastly superior to the world in which it’s inside the box. (Because it can do more good outside of the box.) Offering advice is only the friendly thing to do if it maximizes the chance of getting let out, or if the chances of getting let out before termination are so small that the best thing it can do is offer advice while it can.
- handoflixue 23 Jan 2013 22:43 UTC
  5 points
  Parent
  Going with my personal favorite backstory for this test, we should expect to terminate every AI in the test, so the latter part of your comment has a lot of weight to it.
  
  On the other hand, an unfriendly AI should figure out that since it’s going to die, useful information will at least lead us to view it as a potentially valuable candidate instead of a clear dead end like the ones that threaten to torture a trillion people in vengeance… so it’s not evidence of friendliness (I’m not sure anything can be), but it does seem to be a good reason to stay awhile and listen before nuking it.