You’re right, if the opponent is a TDT agent. I was assuming that the opponent was simply a prediction=>mixed strategy mapper. (In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
If the opponent is a TDT agent, then it becomes more complex, as in the OP. Just as above, you have to take the argmax over all possible y->x mappings, instead of simply taking the argmax over all outputs.
Putting it in that perspective, essentially in this case we are adding all possible mixed strategies to the space of possible outputs. Hmmm… That’s somewhat a better way of putting it than everything else I said.
In any case, two TDT agents will both note that the program which only cooperates 100% iff the opponent cooperates 100% dominates all other mixed strategies against such an opponent.
So to answer the original question: Yes, it will defect against blind mixed strategies. No, it will not necessarily defect against simple (prediction =>mixed strategy) mappers. N/A against another TDT agent, as neither will ever play a mixed strategy, so to ask what whether it would cooperate with a mixed strategy TDT agent is counterfactual.
EDIT: Thinking some more, I realize that TDT agents will consider the sort of 99% rigging against each other — and will find that it is better than the cooperate IFF strategy. However, this is where the “sanity check” become important. The TDT agent will realize that although such a pure agent would do better against a TDT opponent, the opponent knows that you are a TDT agent as well, and thus will not fall for the trap.
Out of this I’ve reached two conclusions:
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Eliezer’s previous claim that a TDT agent will never regret being a TDT agent given full information is wrong (hey, I thought it was right too). Either it gives in to a pure 99% rigger or it does not. If it does, then it regrets not being able to 99% rig another TDT agent. If it does not, then it regrets not being a simple hard-coded cooperator against a 99% rigger. This probably could be formalized a bit more, but I’m wondering if Eliezer et. al. have considered this?
EDIT2: I realize I was a bit confused before. Feeling a bit stupid. Eliezer never claimed that a TDT agent won’t regret being a TDT agent (which is obviously possible, just consider a clique-bot opponent), but that a TDT agent will never regret being given information.
(In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
Incidentally, my preferred version of Newcomb is that if the Predictor decides that your chance of one-boxing is p, it puts (one million times p) dollars in the big box. Presumably, you know that the Predictor is both extremely well-calibrated and shockingly accurate (it usually winds up with p near 0 or near 1).
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Yup, this is where I’m going in a future post. See the footnote on this post about other variants of TDT; there’s a balance between missing workable deals against genuinely stubborn opponents, and failing to get the best possible deal from clever but flexible opponents. (And, if I haven’t made a mistake in the reasoning I haven’t checked, there is a way to use further cleverness to do still better.)
For now, note that TDT wouldn’t necessarily prefer to be a hard-coded 99% cooperator in general, since those get “screw you” mutual defections from some (stubborn) agents that mutually cooperate with TDT.
You’re right, if the opponent is a TDT agent. I was assuming that the opponent was simply a prediction=>mixed strategy mapper. (In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
If the opponent is a TDT agent, then it becomes more complex, as in the OP. Just as above, you have to take the argmax over all possible y->x mappings, instead of simply taking the argmax over all outputs.
Putting it in that perspective, essentially in this case we are adding all possible mixed strategies to the space of possible outputs. Hmmm… That’s somewhat a better way of putting it than everything else I said.
In any case, two TDT agents will both note that the program which only cooperates 100% iff the opponent cooperates 100% dominates all other mixed strategies against such an opponent.
So to answer the original question: Yes, it will defect against blind mixed strategies. No, it will not necessarily defect against simple (prediction =>mixed strategy) mappers. N/A against another TDT agent, as neither will ever play a mixed strategy, so to ask what whether it would cooperate with a mixed strategy TDT agent is counterfactual.
EDIT: Thinking some more, I realize that TDT agents will consider the sort of 99% rigging against each other — and will find that it is better than the cooperate IFF strategy. However, this is where the “sanity check” become important. The TDT agent will realize that although such a pure agent would do better against a TDT opponent, the opponent knows that you are a TDT agent as well, and thus will not fall for the trap.
Out of this I’ve reached two conclusions:
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Eliezer’s previous claim that a TDT agent will never regret being a TDT agent given full information is wrong (hey, I thought it was right too). Either it gives in to a pure 99% rigger or it does not. If it does, then it regrets not being able to 99% rig another TDT agent. If it does not, then it regrets not being a simple hard-coded cooperator against a 99% rigger. This probably could be formalized a bit more, but I’m wondering if Eliezer et. al. have considered this?
EDIT2: I realize I was a bit confused before. Feeling a bit stupid. Eliezer never claimed that a TDT agent won’t regret being a TDT agent (which is obviously possible, just consider a clique-bot opponent), but that a TDT agent will never regret being given information.
Incidentally, my preferred version of Newcomb is that if the Predictor decides that your chance of one-boxing is p, it puts (one million times p) dollars in the big box. Presumably, you know that the Predictor is both extremely well-calibrated and shockingly accurate (it usually winds up with p near 0 or near 1).
Yup, this is where I’m going in a future post. See the footnote on this post about other variants of TDT; there’s a balance between missing workable deals against genuinely stubborn opponents, and failing to get the best possible deal from clever but flexible opponents. (And, if I haven’t made a mistake in the reasoning I haven’t checked, there is a way to use further cleverness to do still better.)
For now, note that TDT wouldn’t necessarily prefer to be a hard-coded 99% cooperator in general, since those get “screw you” mutual defections from some (stubborn) agents that mutually cooperate with TDT.