Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times).
TDT can handle Parfit’s Hitchhiker—pay for the ride, make the same decision at both times, because it forms the counterfactual “If I did not pay, I would not have gotten the ride”. But TDT has difficulty with this particular case, since it implies that B’s original belief that A would not cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B’s payoff can depend on B’s complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.
Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times).
TDT can handle Parfit’s Hitchhiker—pay for the ride, make the same decision at both times, because it forms the counterfactual “If I did not pay, I would not have gotten the ride”. But TDT has difficulty with this particular case, since it implies that B’s original belief that A would not cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B’s payoff can depend on B’s complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.