The original rules allow the AI to provide arbitrary proofs, which the gatekeeper must accept (no saying my cancer cure killed all the test subjects, etc.). Saying you destroy me would require the proof to be false, which is against the rules...
You have to believe that they provided the cure for cancer. You don’t have to discover the cure yourself. You have to believe that you will release the AI. You don’t have to let the AI out.
Typing AI DESTROYED will result in an incoherent counterfactual universe but it isn’t a violation of the rules. It is entirely legitimate for Joe who has encountered a proof that they will do B to do A instead. It means that the universe they are in is nonsensical or the proof flawed but there isn’t anything in the physical representation of Joe or his local environment that dictates that they will do A. In fact, typing “AI DESTROYED” in the face of such a proof would be a heck of a lot easier than, for example, taking a single empty box in Transparent Newcomb’s problem, which is something I’d also do.
(Of course, if the AI player believed that for some reason the burden of making the universe coherent fell upon the gatekeeper then I’d have to revert to Dorikka’s reply.)
I would argue that since the gatekeeper cannot dictate counterfactual results for any other proof (i.e. cannot say “your cancer cure killed everybody!”), that the gatekeeper is obviously responsible for avoiding incoherent, counterfactual universes.
Dorikka’s Clause, of course, beats me just fine :)
You have to believe that they provided the cure for cancer. You don’t have to discover the cure yourself. You have to believe that you will release the AI. You don’t have to let the AI out.
Typing AI DESTROYED will result in an incoherent counterfactual universe but it isn’t a violation of the rules. It is entirely legitimate for Joe who has encountered a proof that they will do B to do A instead. It means that the universe they are in is nonsensical or the proof flawed but there isn’t anything in the physical representation of Joe or his local environment that dictates that they will do A. In fact, typing “AI DESTROYED” in the face of such a proof would be a heck of a lot easier than, for example, taking a single empty box in Transparent Newcomb’s problem, which is something I’d also do.
(Of course, if the AI player believed that for some reason the burden of making the universe coherent fell upon the gatekeeper then I’d have to revert to Dorikka’s reply.)
I would argue that since the gatekeeper cannot dictate counterfactual results for any other proof (i.e. cannot say “your cancer cure killed everybody!”), that the gatekeeper is obviously responsible for avoiding incoherent, counterfactual universes.
Dorikka’s Clause, of course, beats me just fine :)