UDT is provably optimal if it has correct priors over possible universes and the universe can read its mind only through determining its behavior in hypothetical situations (because UDT basically is just find the behavior pattern that optimizes expected utility and implement that).
On the other hand, SMCDT is provably optimal in situations where it has an accurate posterior probability distribution, and where the universe can read its mind but not its initial state (because it just instantly self-modifies to the optimally performing program).
I don’t see why the former set of restrictions is any more reasonable than the latter, and at least for SMCDT you can figure out what it would do in a given situation without first specifying a prior over possible universes.
I’m also not convinced that it is even worth spending so much effort trying to decide the optimal decision theory in situations where the universe can read your mind. This is not a realistic model to begin with.
Actually, I take it back. Depending on how you define things, UDT can still lose. Consider the following game:
I will clone you. One of the clones I paint red and the other I paint blue. The red clone I give $1000000 and the blue clone I fine $1000000. UDT clearly gets expectation 0 out of this. SMCDT however can replace its code with the following:
If you are painted blue: wipe your hard drive
If you are painted red: change your code back to standard SMCDT
Thus, SMCDT never actually has to play blue in this game, while UDT does.
You seem to be comparing SMCDT to a UDT agent that can’t self-modify (or commit suicide). The self-modifying part is the only reason SMCDT wins here.
The ability to self-modify is clearly beneficial (if you have correct beliefs and act first), but it seems separate from the question of which decision theory to use.
Which is actually one of the annoying things about UDT. Your strategy cannot depend simply on your posterior probability distribution, it has to depend on your prior probability distribution. How you even in practice determine your priors for Newcomb vs. anti-Newcomb is really beyond me.
But in any case, assuming that one is more common, UDT does lose this game.
Which is actually one of the annoying things about UDT. Your strategy cannot depend simply on your posterior probability distribution, it has to depend on your prior probability distribution. How you even in practice determine your priors for Newcomb vs. anti-Newcomb is really beyond me.
No-one said that winning was easy. This problem isn’t specific to UDT. It’s just that CDT sweeps the problem under the rug by “setting its priors to a delta function” at the point where it gets to decide. CDT can win this scenario if it self-modifies beforehand (knowing the correct frequencies of newcomb vs anti-newcomb, to know how to self-modify) - but SMCDT is not a panacea, simply because you don’t necessarily get a chance to self-modify beforehand.
CDT does not avoid this issue by “setting its priors to the delta function”. CDT deals with this issue by being a theory where your course of action only depends on your posterior distribution. You can base your actions only on what the universe actually looks like rather than having to pay attention to all possible universes. Given that it’s basically impossible to determine anything about what Kolmogorov priors actually say, being able to totally ignore parts of probability space that you have ruled out is a big deal.
… And this whole issue with not being able to self-modify beforehand. This only matters if your initial code affects the rest of the universe. To be more precise, this is only an issue if the problem is phrased in such a way that the universe you have to deal with depends on the code you are running. If we instantiate the Newcomb’s problem in the middle of the decision, UDT faces a world with the first box full while CDT faces a world with the first box empty. UDT wins because the scenario is in its favor before you even start the game.
If you really think that this is a big deal, you should try to figure out which decision theories are only created by universes that want to be nice to them and try using one of those.
Actually thinking about it this way, I have seen the light. CDT makes the faulty assumption that your initial state in uncorrelated with the universe that you find yourself in (who knows, you might wake up in the middle of Newcomb’s problem and find that whether or not you get $1000000 depends on whether or not your code is such that you would one-box in Newcomb’s problem). UDT goes some ways to correct this issue, but it doesn’t go far enough.
I would like to propose a new, more optimal decision theory. Call it ADT for Anthropic Decision Theory. Actually, it depends on a prior, so assume that you’ve picked out one of those. Given your prior, ADT is the decision theory D that maximizes the expected (given your prior) lifetime utility of all agents using D as their decision theory. Note how agents using ADT do provably better than agents using any other decision theory.
Note that I have absolutely no idea what ADT does in, well, any situation, but that shouldn’t stop you from adopting it. It is optimal after all.
Why does UDT lose this game? If it knows anti-Newcomb is much more likely, it will two-box on Newcomb and do just as well as CDT. If Newcomb is more common, UDT one-boxes and does better than CDT.
I guess my point is that it is nonsensical to ask “what does UDT do in situation X” without also specifying the prior over possible universes that this particular UDT is using. Given that this is the case, what exactly do you mean by “losing game X”?
Well, you can talk about “what does decision theory W do in situation X” without specifying the likelyhood of other situations, by assuming that all agents start with a prior that sets P(X) = 1. In that case UDT clearly wins the anti-newcomb scenario because it knows that actual newcomb’s “never happens” and therefore it (counterfactually) two-boxes.
The only problem with this treatment is that in real life P(anti-newcomb) = 1 is an unrealistic model of the world, and you really should have a prior for P(anti-newcomb) vs P(newcomb). A decision theory that solves the restricted problem is not necessarily a good one for solving real life problems in general.
Well, perhaps. I think that the bigger problem is that under reasonable priors P(Newcomb) and P(anti-Newcomb) are both so incredibly small that I would have trouble finding a meaningful way to approximate their ratio.
How confident are you that UDT actually one-boxes?
Also yeah, if you want a better scenario where UDT loses see my PD against 99% prob. UDT and 1% prob. CDT example.
Fine. How about this: “Have $1000 if you would have two-boxed in Newcomb’s problem.”
The optimal solution to that naturally depend on the relative probabilities of that deal being offered vs newcomb’s itself.
OK. Fine. I will grant you this:
UDT is provably optimal if it has correct priors over possible universes and the universe can read its mind only through determining its behavior in hypothetical situations (because UDT basically is just find the behavior pattern that optimizes expected utility and implement that).
On the other hand, SMCDT is provably optimal in situations where it has an accurate posterior probability distribution, and where the universe can read its mind but not its initial state (because it just instantly self-modifies to the optimally performing program).
I don’t see why the former set of restrictions is any more reasonable than the latter, and at least for SMCDT you can figure out what it would do in a given situation without first specifying a prior over possible universes.
I’m also not convinced that it is even worth spending so much effort trying to decide the optimal decision theory in situations where the universe can read your mind. This is not a realistic model to begin with.
Actually, I take it back. Depending on how you define things, UDT can still lose. Consider the following game:
I will clone you. One of the clones I paint red and the other I paint blue. The red clone I give $1000000 and the blue clone I fine $1000000. UDT clearly gets expectation 0 out of this. SMCDT however can replace its code with the following: If you are painted blue: wipe your hard drive If you are painted red: change your code back to standard SMCDT
Thus, SMCDT never actually has to play blue in this game, while UDT does.
You seem to be comparing SMCDT to a UDT agent that can’t self-modify (or commit suicide). The self-modifying part is the only reason SMCDT wins here.
The ability to self-modify is clearly beneficial (if you have correct beliefs and act first), but it seems separate from the question of which decision theory to use.
Which is actually one of the annoying things about UDT. Your strategy cannot depend simply on your posterior probability distribution, it has to depend on your prior probability distribution. How you even in practice determine your priors for Newcomb vs. anti-Newcomb is really beyond me.
But in any case, assuming that one is more common, UDT does lose this game.
No-one said that winning was easy. This problem isn’t specific to UDT. It’s just that CDT sweeps the problem under the rug by “setting its priors to a delta function” at the point where it gets to decide. CDT can win this scenario if it self-modifies beforehand (knowing the correct frequencies of newcomb vs anti-newcomb, to know how to self-modify) - but SMCDT is not a panacea, simply because you don’t necessarily get a chance to self-modify beforehand.
CDT does not avoid this issue by “setting its priors to the delta function”. CDT deals with this issue by being a theory where your course of action only depends on your posterior distribution. You can base your actions only on what the universe actually looks like rather than having to pay attention to all possible universes. Given that it’s basically impossible to determine anything about what Kolmogorov priors actually say, being able to totally ignore parts of probability space that you have ruled out is a big deal.
… And this whole issue with not being able to self-modify beforehand. This only matters if your initial code affects the rest of the universe. To be more precise, this is only an issue if the problem is phrased in such a way that the universe you have to deal with depends on the code you are running. If we instantiate the Newcomb’s problem in the middle of the decision, UDT faces a world with the first box full while CDT faces a world with the first box empty. UDT wins because the scenario is in its favor before you even start the game.
If you really think that this is a big deal, you should try to figure out which decision theories are only created by universes that want to be nice to them and try using one of those.
Actually thinking about it this way, I have seen the light. CDT makes the faulty assumption that your initial state in uncorrelated with the universe that you find yourself in (who knows, you might wake up in the middle of Newcomb’s problem and find that whether or not you get $1000000 depends on whether or not your code is such that you would one-box in Newcomb’s problem). UDT goes some ways to correct this issue, but it doesn’t go far enough.
I would like to propose a new, more optimal decision theory. Call it ADT for Anthropic Decision Theory. Actually, it depends on a prior, so assume that you’ve picked out one of those. Given your prior, ADT is the decision theory D that maximizes the expected (given your prior) lifetime utility of all agents using D as their decision theory. Note how agents using ADT do provably better than agents using any other decision theory.
Note that I have absolutely no idea what ADT does in, well, any situation, but that shouldn’t stop you from adopting it. It is optimal after all.
Why does UDT lose this game? If it knows anti-Newcomb is much more likely, it will two-box on Newcomb and do just as well as CDT. If Newcomb is more common, UDT one-boxes and does better than CDT.
I guess my point is that it is nonsensical to ask “what does UDT do in situation X” without also specifying the prior over possible universes that this particular UDT is using. Given that this is the case, what exactly do you mean by “losing game X”?
Well, you can talk about “what does decision theory W do in situation X” without specifying the likelyhood of other situations, by assuming that all agents start with a prior that sets P(X) = 1. In that case UDT clearly wins the anti-newcomb scenario because it knows that actual newcomb’s “never happens” and therefore it (counterfactually) two-boxes.
The only problem with this treatment is that in real life P(anti-newcomb) = 1 is an unrealistic model of the world, and you really should have a prior for P(anti-newcomb) vs P(newcomb). A decision theory that solves the restricted problem is not necessarily a good one for solving real life problems in general.
Well, perhaps. I think that the bigger problem is that under reasonable priors P(Newcomb) and P(anti-Newcomb) are both so incredibly small that I would have trouble finding a meaningful way to approximate their ratio.
How confident are you that UDT actually one-boxes?
Also yeah, if you want a better scenario where UDT loses see my PD against 99% prob. UDT and 1% prob. CDT example.