Hard to summarize, and too long to copy here. The chief insight I got from it is taken from:
In the case of the XOR blackmail problem, there are four “possible” worlds: LT (letter + termites), NT (noletter + termites), LN (letter + notermites), and NN (noletter + notermites).
The predictor, by dint of their accuracy, has put the universe into a state where the only consistent possibilities are either (LT, NN) or (LN, NT). You get to choose which of those pairs is consistent and which is contradictory. Clearly, you don’t have control over the probability of termites vs. notermites, so you’re only controlling whether you get the letter. Thus, the question is whether you’re willing to pay $1000 to make sure that the letter shows up only in the worlds where you don’t have termites.
Even when you’re holding the letter in your hands, I claim that you should not say “if I pay I will have no termites”, because that is false — your action can’t affect whether you have termites. You should instead say:
I see two possibilities here. If my algorithm outputs pay, then in the XX% of worlds where I have termites I get no letter and lose $1M, and in the (100-XX)% of worlds where I do not have termites I lose $1k. If instead my algorithm outputs refuse, then in the XX% of worlds where I have termites I get this letter but only lose $1M, and in the other worlds I lose nothing. The latter mixture is preferable, so I do not pay.
You’ll notice that the agent in this line of reasoning is not updating on the fact that they’re holding the letter. They’re not saying, “Given that I know that I received the letter and that the universe is consistent…”
Hard to summarize, and too long to copy here. The chief insight I got from it is taken from: