First, to be clear, I am referring to things such as this description of the prisoner’s dilemma and EY’s claim that TDT endorses cooperation. The published material has been careful to only say that these decision theories endorse cooperation among identical copies running the same source code, but as far as I can tell some researchers at MIRI still believe this stronger claim and this claim has been a major part of the public perception of these decision theories (example here; see section II).
The problem is that when two FDT agent with a different utility functions and different prior knowledge are facing a prisoner’s dilemma with each other, then their decisions are actually two different logical variables X0 and X1. The argument for cooperating is that X0 and X1 are sufficiently similar to one another that in the counterfactual where X0=C we also have X1=C. However, you could just as easily take the opposite premise, where X0 and X1 are sufficiently dissimilar that counterfactually changing X0 will have no effect on X1. Then you are left with the usual CDT analysis of the game. Given the vagueness of logical counterfactuals it is impossible to distinguish these two situations.
Here’s a related question: What does FDT say about the centipede game? There’s no symmetry between the players so I can’t just plug in the formalism. I don’t see how you can give an answer that’s in the spirit of cooperating in the prisoner’s dilemma without reaching the conclusion that FDT involves altruism among all FDT agents through some kind of veil of ignorance argument. And taking that conclusion is counter to the affine-transformation-invariance of utility functions.
First, to be clear, I am referring to things such as this description of the prisoner’s dilemma and EY’s claim that TDT endorses cooperation. The published material has been careful to only say that these decision theories endorse cooperation among identical copies running the same source code, but as far as I can tell some researchers at MIRI still believe this stronger claim and this claim has been a major part of the public perception of these decision theories (example here; see section II).
The problem is that when two FDT agent with a different utility functions and different prior knowledge are facing a prisoner’s dilemma with each other, then their decisions are actually two different logical variables X0 and X1. The argument for cooperating is that X0 and X1 are sufficiently similar to one another that in the counterfactual where X0=C we also have X1=C. However, you could just as easily take the opposite premise, where X0 and X1 are sufficiently dissimilar that counterfactually changing X0 will have no effect on X1. Then you are left with the usual CDT analysis of the game. Given the vagueness of logical counterfactuals it is impossible to distinguish these two situations.
Here’s a related question: What does FDT say about the centipede game? There’s no symmetry between the players so I can’t just plug in the formalism. I don’t see how you can give an answer that’s in the spirit of cooperating in the prisoner’s dilemma without reaching the conclusion that FDT involves altruism among all FDT agents through some kind of veil of ignorance argument. And taking that conclusion is counter to the affine-transformation-invariance of utility functions.