wedrifid comments on AI box: AI has one shot at avoiding destruction—what might it say?

wedrifid 23 Jan 2013 4:05 UTC
0 points

The original rules allow the AI to provide arbitrary proofs, which the gatekeeper must accept (no saying my cancer cure killed all the test subjects, etc.). Saying you destroy me would require the proof to be false, which is against the rules...

You have to believe that they provided the cure for cancer. You don’t have to discover the cure yourself. You have to believe that you will release the AI. You don’t have to let the AI out.

Typing AI DESTROYED will result in an incoherent counterfactual universe but it isn’t a violation of the rules. It is entirely legitimate for Joe who has encountered a proof that they will do B to do A instead. It means that the universe they are in is nonsensical or the proof flawed but there isn’t anything in the physical representation of Joe or his local environment that dictates that they will do A. In fact, typing “AI DESTROYED” in the face of such a proof would be a heck of a lot easier than, for example, taking a single empty box in Transparent Newcomb’s problem, which is something I’d also do.

(Of course, if the AI player believed that for some reason the burden of making the universe coherent fell upon the gatekeeper then I’d have to revert to Dorikka’s reply.)
- handoflixue 23 Jan 2013 21:30 UTC
  1 point
  Parent
  I would argue that since the gatekeeper cannot dictate counterfactual results for any other proof (i.e. cannot say “your cancer cure killed everybody!”), that the gatekeeper is obviously responsible for avoiding incoherent, counterfactual universes.
  
  Dorikka’s Clause, of course, beats me just fine :)