This comment is a mishmash of several reactions / comments.
1: Thanks for the interesting post!
2: Tongue in cheek: Using CDT seems like a lot of work. How about just picking whatever action we think of first, thus avoiding the need to model the environment and do an expensive expected utility calculation. So it’s unclear why we really need CDT in the first place, and we should default to picking whatever action we think of first.
3: The counterfactual type of UDT is much closer to UEDT than UCDT (though “just use logical counterfactuals” is a larger change and a tougher nut to crack than it first might appear at first glance.). A lot of the reason to use something like UDT is to handle cases where other agents know your reasoning procedure and take actions in anticipation of your future decisions, so it’s not very useful to do something that looks like intervening only on your decision algorithm and nothing else about the world. (You could define logical counterfactuals in a different way so that UCDT works, but now I think you’d just be sweeping small problems under the rug of a bigger problem.)
4: I’m still trying to figure out a simple way to argue that UDT(1) is mandated by Savage’s theorem. (Savage’s theorem is the one where you make some assumptions about “rational behavior” and then get out bothprobabilistic reasoning and expected utility maximization.)
Savage’s theorem talks about “actions,” “states,” and “consequences,” but really those are just labels for mathematical objects with certain properties. My suspicion is that games where you need UDT(1.0) are ones where some sleight of hand has been played, and the thing the game calls “actions”/”states” aren’t actually “actions”/”states” in the Savage’s theorem sense, but your policy still fulfills Savage’s conditions to be an “action.”
E.g. one Savage postulate is “For all actions a, b, x, and y, and some set of states E, you prefer (a if E else x) to (b if E else x) if and only if you prefer (a if E else y) to (b if E else y).” First, note that this sort of independence of alternatives might not hold in cases like Newcomb’s problem or the absent-minded driver. Second, note that this implicitly says that states are the sorts of things you can condition actions on (and actions are the sorts of things you can condition on states).
Not sure I get your point. Seems like you’re saying that EU maximization is analogously poorly motivated?
I’m confused. In this post I use “UEDT” to mean “UDT with conditionals”, and “UCDT” to precisely mean “UDT with logical counterfactuals”. I’m not saying that this is necessarily the optimal terminology, but it seems like you’re thinking of UCDT in a different way here? (Perhaps CDT + updatelessness?)
Seems difficult, but I’d be very interested in reading more!
This comment is a mishmash of several reactions / comments.
1: Thanks for the interesting post!
2: Tongue in cheek: Using CDT seems like a lot of work. How about just picking whatever action we think of first, thus avoiding the need to model the environment and do an expensive expected utility calculation. So it’s unclear why we really need CDT in the first place, and we should default to picking whatever action we think of first.
3: The counterfactual type of UDT is much closer to UEDT than UCDT (though “just use logical counterfactuals” is a larger change and a tougher nut to crack than it first might appear at first glance.). A lot of the reason to use something like UDT is to handle cases where other agents know your reasoning procedure and take actions in anticipation of your future decisions, so it’s not very useful to do something that looks like intervening only on your decision algorithm and nothing else about the world. (You could define logical counterfactuals in a different way so that UCDT works, but now I think you’d just be sweeping small problems under the rug of a bigger problem.)
4: I’m still trying to figure out a simple way to argue that UDT(1) is mandated by Savage’s theorem. (Savage’s theorem is the one where you make some assumptions about “rational behavior” and then get out both probabilistic reasoning and expected utility maximization.)
Savage’s theorem talks about “actions,” “states,” and “consequences,” but really those are just labels for mathematical objects with certain properties. My suspicion is that games where you need UDT(1.0) are ones where some sleight of hand has been played, and the thing the game calls “actions”/”states” aren’t actually “actions”/”states” in the Savage’s theorem sense, but your policy still fulfills Savage’s conditions to be an “action.”
E.g. one Savage postulate is “For all actions a, b, x, and y, and some set of states E, you prefer (a if E else x) to (b if E else x) if and only if you prefer (a if E else y) to (b if E else y).” First, note that this sort of independence of alternatives might not hold in cases like Newcomb’s problem or the absent-minded driver. Second, note that this implicitly says that states are the sorts of things you can condition actions on (and actions are the sorts of things you can condition on states).
:)
Not sure I get your point. Seems like you’re saying that EU maximization is analogously poorly motivated?
I’m confused. In this post I use “UEDT” to mean “UDT with conditionals”, and “UCDT” to precisely mean “UDT with logical counterfactuals”. I’m not saying that this is necessarily the optimal terminology, but it seems like you’re thinking of UCDT in a different way here? (Perhaps CDT + updatelessness?)
Seems difficult, but I’d be very interested in reading more!