UDT is (roughly) defined as “follow whatever commitments a past version of yourself would have made if they’d thought about your situation”.
This seems substantially different from UDT, which does not really have or use a notion of “past version of yourself”. For example imagine a variant of Counterfactual Mugging in which there is no preexisting agent, and instead Omega creates an agent from scratch after flipping the coin and gives it the decision problem. UDT is fine with this but “follow whatever commitments a past version of yourself would have made if they’d thought about your situation” wouldn’t work.
I recall that I described “exceptionless decision theory” or XDT as “do what my creator would want me to do”, which seems closer to your idea. I don’t think I followed up the idea beyond this, maybe because I realized that humans aren’t running any formal decision theory, so “what my creator would want me to do” is ill defined. (Although one could say my interest in metaphilosophy is related to this, since what I would want an AI to do is to solve normative decision theory using correct philosophical reasoning, and then do what it recommends.)
Anyway, the upshot is that I think you’re exploring a decision theory approach that’s pretty distinct from UDT so it’s probably a good idea to call it something else. (However there may be something similar in the academic literature, or someone described something similar on LW that I’m not familiar with or forgot.)
This seems substantially different from UDT, which does not really have or use a notion of “past version of yourself”.
My terminology here was sloppy, apologies. When I say “past versions of yourself” I am also including (as Nesov phrases it below) “the idealized past agent (which doesn’t physically exist)”. E.g. in the Counterfactual Mugging case you describe, I am thinking about precommitments that the hypothetical past version of yourself from before the coin was flipped would have committed to.
I find it a more intuitive way to think about UDT, though I realize it’s a somewhat different framing from yours. Do you still think this is substantially different?
This seems substantially different from UDT, which does not really have or use a notion of “past version of yourself”. For example imagine a variant of Counterfactual Mugging in which there is no preexisting agent, and instead Omega creates an agent from scratch after flipping the coin and gives it the decision problem. UDT is fine with this but “follow whatever commitments a past version of yourself would have made if they’d thought about your situation” wouldn’t work.
I recall that I described “exceptionless decision theory” or XDT as “do what my creator would want me to do”, which seems closer to your idea. I don’t think I followed up the idea beyond this, maybe because I realized that humans aren’t running any formal decision theory, so “what my creator would want me to do” is ill defined. (Although one could say my interest in metaphilosophy is related to this, since what I would want an AI to do is to solve normative decision theory using correct philosophical reasoning, and then do what it recommends.)
Anyway, the upshot is that I think you’re exploring a decision theory approach that’s pretty distinct from UDT so it’s probably a good idea to call it something else. (However there may be something similar in the academic literature, or someone described something similar on LW that I’m not familiar with or forgot.)
My terminology here was sloppy, apologies. When I say “past versions of yourself” I am also including (as Nesov phrases it below) “the idealized past agent (which doesn’t physically exist)”. E.g. in the Counterfactual Mugging case you describe, I am thinking about precommitments that the hypothetical past version of yourself from before the coin was flipped would have committed to.
I find it a more intuitive way to think about UDT, though I realize it’s a somewhat different framing from yours. Do you still think this is substantially different?