My other comment might have come across as as unnecessarily hostile. For the sake of a productive discussion, can you please point out (or provide some more specific reference) how TDT can possibly succeed at achieving mutual cooperation in the standard, one-shot, no-communication prisoner’s dilemma? Because, in general, that doesn’t seem to be possible.
I mean, if you are certain (or highly confident) that you are playing against a mental clone (a stronger condition that just using the same decision theory), then you can safely cooperate. But, in most scenarios, it’s foolish to have a prior like that: other agents that aren’t mental clones of yours will shamelessly exploit you. Even if you start with a large population of mental clones playing anonymous PD against each others, if there are mutations (random or designed), as soon as defectors appear, they will start exploiting the clones that blindly cooperate, at least until the clones have updated their beliefs and switch to defecting, which yields the Nash equilibrium. Blindly trusting that you are playing against mental clones is a very unstable strategy.
In the program-swap version of one-shot prisoner dilemma, strategies like the CliqueBots (and generalizations) can achieve stable mutual cooperation because they can actually check for each game whether the other party is a mental clone (there is the problem of coordinating on a single clique, but once the clique is chosen, there is no incentive to deviate from it). But achieving mutual cooperation in the standard one-shot PD seems impossible without making unrealistic assumptions about the other players. I don’t think that even Yudkowsky or other MIRI people argued that TDT can achieve that.
My other comment might have come across as as unnecessarily hostile. For the sake of a productive discussion, can you please point out (or provide some more specific reference) how TDT can possibly succeed at achieving mutual cooperation in the standard, one-shot, no-communication prisoner’s dilemma? Because, in general, that doesn’t seem to be possible.
I mean, if you are certain (or highly confident) that you are playing against a mental clone (a stronger condition that just using the same decision theory), then you can safely cooperate.
But, in most scenarios, it’s foolish to have a prior like that: other agents that aren’t mental clones of yours will shamelessly exploit you. Even if you start with a large population of mental clones playing anonymous PD against each others, if there are mutations (random or designed), as soon as defectors appear, they will start exploiting the clones that blindly cooperate, at least until the clones have updated their beliefs and switch to defecting, which yields the Nash equilibrium.
Blindly trusting that you are playing against mental clones is a very unstable strategy.
In the program-swap version of one-shot prisoner dilemma, strategies like the CliqueBots (and generalizations) can achieve stable mutual cooperation because they can actually check for each game whether the other party is a mental clone (there is the problem of coordinating on a single clique, but once the clique is chosen, there is no incentive to deviate from it).
But achieving mutual cooperation in the standard one-shot PD seems impossible without making unrealistic assumptions about the other players. I don’t think that even Yudkowsky or other MIRI people argued that TDT can achieve that.