Stuart, sorry I never replied to this; I wasn’t sure what to say until thinking about it again when replying to Wei Dai just now.
I’d say that what’s going on here is that UDT does not make the best choices given the information available at every point in time—which is not a defect of UDT, it’s a reflection of the fact that if at every point in time, you make what would be the best choice according to the info available at that point in time, you become time-inconsistent and end up with a bad payoff.
To bring this out, consider a Counterfactual Mugging scenario where you must pay $1 before the coin flip, plus $100 if the coin comes up tails, to win the $10,000 if the coin comes up heads. According to the info available before the flip, it’s best to pay $1 now and $100 on tails. According to the info when the coin has come up tails, it’s best to not pay up. So an algorithm making both of these choices in the respective situations would be a money-pump that pays $1 without ever getting anything in return.
So my answer to your question would be, what this shows is a divergence between UDT applied to the info before the flip and UDT applied to the info after the flip—and no, you can’t have the best of both worlds...
“you” could be a UDT agent. So does this example show a divergence between UDT and XDT applied to UDT?
Stuart, sorry I never replied to this; I wasn’t sure what to say until thinking about it again when replying to Wei Dai just now.
I’d say that what’s going on here is that UDT does not make the best choices given the information available at every point in time—which is not a defect of UDT, it’s a reflection of the fact that if at every point in time, you make what would be the best choice according to the info available at that point in time, you become time-inconsistent and end up with a bad payoff.
To bring this out, consider a Counterfactual Mugging scenario where you must pay $1 before the coin flip, plus $100 if the coin comes up tails, to win the $10,000 if the coin comes up heads. According to the info available before the flip, it’s best to pay $1 now and $100 on tails. According to the info when the coin has come up tails, it’s best to not pay up. So an algorithm making both of these choices in the respective situations would be a money-pump that pays $1 without ever getting anything in return.
So my answer to your question would be, what this shows is a divergence between UDT applied to the info before the flip and UDT applied to the info after the flip—and no, you can’t have the best of both worlds...