Well, that depends. It could turn out to be the case that, in reality, for some fixed definition of fair, the universe is unfair. If that were the case, I think at least some of the philosophers who study decision theory would maintain a distinction between ideal rational behavior, whatever that means, and the behavior that, in the universe, consistently results in the highest payoffs. But Eliezer / MIRI is solely interested in the latter. So it depends on what your priorities are.
for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form “Omega punishes agents who use decision theory X and rewards agents who use decision theory Y,”
Then we should be able to come up with a Newcomb like problem that specifically punishes TDT agents (off the top of my head, Omega gives an additional 10 million to any agent not using TDT at the end of the box exercise). And if we can come up with such a problem, and EY/MIRI can’t respond by calling foul (for the reasons you give), then getting richer on Newcomb isn’t a reason to accept TDT.
The “practical” question is whether you in fact expect there to be things in the universe that specifically punish TDT agents. Omega in Newcomb’s problem is doing something that plausibly is very general, namely attempting to predict the behavior of other agents: this is plausibly a general thing that agents in the universe do, as opposed to specifically punishing TDT agents.
TDT also isn’t perfect; Eliezer has examples of (presumably, in his eyes, fair) problems where it gives the wrong answer (although I haven’t worked through them myself).
Omega in Newcomb’s problem is doing something that plausibly is very general
This seems to be the claim under dispute, and the question of fairness should be distinguished from the claim that Omega is doing something realistic or unrealistic. I think we agree that Newcomb-like situations are practically possible. But it may be that my unfair game is practically possible too, and that in principle no decision theory can come out maximizing utility in every practically possible game.
One response might be to say Newcomb’s problem is more unfair than the problem of simply choosing between two boxes containing different amounts of money, because Newcomb’s distribution of utility makes mention of the decision. Newcomb’s is unfair because it goes meta on the decider. My TDT punishing game is much more unfair than Newcomb’s because it goes one ‘meta’ level up from there, making mention of the decision theories.
You could argue that even if no decision theory can maximise in every arbitrarily unfair game, there are degrees of unfairness related to the degree to which the problem ‘goes meta’. We should just prefer the decision theory that can maximise the at the highest level of unfairness. This could probably be supported by the observation that while all these unfair games are practically possible, the more unfair a game is the less likely we are to encounter it outside of a philosophy paper. You could probably come up with a formalization of unfairness, though it might be tricky to argue that it’s relevantly exhaustive and linear.
EDIT: (Just a note, you could argue all this without actually granting that my unfair game is practically possible, or that Newcomb’s problem is unfair, since the two-boxer will provide those premises.)
A theory that is incapable of dealing with agents that make decisions based on the projected reactions of other players, is worthless in the real world.
TDT does in fact sketch a fairly detailed model of “what sort of situation is ‘fair’ for the purpose of this paper”, and it explicitly excludes referring to the specific theory that the agent implements. Note that Newcomb did not set out to deliberately punish TDT (would be hard; considering Newcomb predates TDT); so your variation shouldn’t either.
I think an easy way to judge between fair and unfair problems is whether you need to label the decision theory. Without a little label saying “TDT” or “CDT”, Omega can still punish two-boxers based on the outcome (factual or counterfactual) of their decision theory, regardless of what decision theory they used.
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
Typically by withholding information about the actual payoffs that will be experienced. eg. Tell the agents they are playing Newcomb’s problem but don’t mention that all millionaires are going to be murdered...
Hmm, that is an interesting objection. Would you be willing to sketch out (or point me to) a response to it?
Well, that depends. It could turn out to be the case that, in reality, for some fixed definition of fair, the universe is unfair. If that were the case, I think at least some of the philosophers who study decision theory would maintain a distinction between ideal rational behavior, whatever that means, and the behavior that, in the universe, consistently results in the highest payoffs. But Eliezer / MIRI is solely interested in the latter. So it depends on what your priorities are.
Well, if this is right...
Then we should be able to come up with a Newcomb like problem that specifically punishes TDT agents (off the top of my head, Omega gives an additional 10 million to any agent not using TDT at the end of the box exercise). And if we can come up with such a problem, and EY/MIRI can’t respond by calling foul (for the reasons you give), then getting richer on Newcomb isn’t a reason to accept TDT.
The “practical” question is whether you in fact expect there to be things in the universe that specifically punish TDT agents. Omega in Newcomb’s problem is doing something that plausibly is very general, namely attempting to predict the behavior of other agents: this is plausibly a general thing that agents in the universe do, as opposed to specifically punishing TDT agents.
TDT also isn’t perfect; Eliezer has examples of (presumably, in his eyes, fair) problems where it gives the wrong answer (although I haven’t worked through them myself).
This seems to be the claim under dispute, and the question of fairness should be distinguished from the claim that Omega is doing something realistic or unrealistic. I think we agree that Newcomb-like situations are practically possible. But it may be that my unfair game is practically possible too, and that in principle no decision theory can come out maximizing utility in every practically possible game.
One response might be to say Newcomb’s problem is more unfair than the problem of simply choosing between two boxes containing different amounts of money, because Newcomb’s distribution of utility makes mention of the decision. Newcomb’s is unfair because it goes meta on the decider. My TDT punishing game is much more unfair than Newcomb’s because it goes one ‘meta’ level up from there, making mention of the decision theories.
You could argue that even if no decision theory can maximise in every arbitrarily unfair game, there are degrees of unfairness related to the degree to which the problem ‘goes meta’. We should just prefer the decision theory that can maximise the at the highest level of unfairness. This could probably be supported by the observation that while all these unfair games are practically possible, the more unfair a game is the less likely we are to encounter it outside of a philosophy paper. You could probably come up with a formalization of unfairness, though it might be tricky to argue that it’s relevantly exhaustive and linear.
EDIT: (Just a note, you could argue all this without actually granting that my unfair game is practically possible, or that Newcomb’s problem is unfair, since the two-boxer will provide those premises.)
A theory that is incapable of dealing with agents that make decisions based on the projected reactions of other players, is worthless in the real world.
However, an agent that makes decisions based on the fact that it perfectly predicts the reactions of other players does not exist in the real world.
Newcomb does not require a perfect predictor.
I know that the numbers in the canonical case work out to .5005 accuracy for the required; within noise of random.
TDT does in fact sketch a fairly detailed model of “what sort of situation is ‘fair’ for the purpose of this paper”, and it explicitly excludes referring to the specific theory that the agent implements. Note that Newcomb did not set out to deliberately punish TDT (would be hard; considering Newcomb predates TDT); so your variation shouldn’t either.
I think an easy way to judge between fair and unfair problems is whether you need to label the decision theory. Without a little label saying “TDT” or “CDT”, Omega can still punish two-boxers based on the outcome (factual or counterfactual) of their decision theory, regardless of what decision theory they used.
How do you penalize TDT, without actually having to say “I’ll penalize TDT”, based solely on the expected results of the decision theory?
You penalise based on the counterfactual outcome: if they were in Newcomb’s problem, this person would choose one box.
Typically by withholding information about the actual payoffs that will be experienced. eg. Tell the agents they are playing Newcomb’s problem but don’t mention that all millionaires are going to be murdered...