A Paradox in Timeless Decision Theory
I’m putting this in the discussion section because I’m not sure whether something like this has already been thought of, and I don’t want to repeat things in a top-level post.
Anyway, consider a Prisoner’s-Dilemma-like situation with the following payoff matrix:
You defect, opponent defects: 0 utils
You defect, opponent cooperates: 3 utils
You cooperate, opponent defects: 1 util
You cooperate, opponent cooperates: 2 utils
Assume all players have either have full information about their opponents, or are allowed to communicate and will be able to deduce each others’ strategy correctly.
Suppose you are a a timeless decision theory agent playing this modified Prisoner’s Dilemma with an actor that will always pick “defect” no matter what your strategy is. Clearly, your best move is to cooperate, gaining you 1 util instead of no utility, and giving your opponent his maximum 3 utils instead of the no utility he would get if you defected. Now suppose you are playing against another timeless decision theory agent. Clearly, the best strategy is to be that actor which defects no matter what. If both agents do this, the worst possible result for both of them occurs.
This situation can actually happen in the real world. Suppose there are two rival countries, and one demands some tribute or concession from the other, and threatens war if the other country does not agree, even though such a war would be very costly for both countries. The rulers of the threatened country can either pay the less expensive tribute or accept a more expensive war so that the first country will back off, but the rulers of the first country have thought of that and have committed to not back down anyway. If the tribute is worth 1 util to each side, and a war costs 2 utils to each side, this is identical to the payoff matrix I described. I’d be pretty surprised if nothing like this has ever happened.
Here is what I believe to be the standard explanation.
Unfortunately, you don’t have the option of playing the same strategy as a “perfect defector” since you are currently a hypothetical TDT agent. You can of course play the strategy of being a hypothetical TDT agent that turned itself into a perfect defector. However, from the point of view of your TDT opponent this is a different strategy. In particular, a TDT will cooperate when confronted with a “true” perfect defector but defect§ when faced with an ex-TDT that turned itself into one. Therefore, even though the perfect defector would gain 3 utils, there is no strategy you as a TDT can follow that will mimic the perfect defector so you might as well act like a true TDT and agree to cooperate.
This does, however, raise interesting questions about why you aren’t winning.
BTW, the standard name for this prisoner’s dilemma variant is chicken.
§ Edit: Actually after thinking about it I realized that what a TDT would do is cooperate with probability 2/3-ε and defect with probability 1/3+ε. This gives him a higher utility, 2/3-ε instead of 0, and still leaves you with a utility of 2-3ε, which is still enough to make you wish you had played a strait TDT strategy and cooperated.
Fair enough, and thanks for supplying the name.
It does not matter what probability of defecting if you expect the other agent to defect you precommit to, just so long as it is greater than 1⁄3. This is because if you do precommit to defecting with probability > 1⁄3 in that situation, the probability of that situation occurring is exactly 0. Of course, that assumes mutual perfect information about each others’ strategy. If beliefs about each others’ strategy is merely very well correlated with reality, it may be better to commit to always defecting anyway, because if your strategy is to defect with probability slightly greater than 1⁄3, and the other agent expects a high probability that that is your strategy, but also some probability that you will chicken out and cooperate with with probability 1, he might decide that defecting is worthwhile. If he does, that indicates that your probability of defecting was too low. Of course, having a higher chance of defecting conditional on him defecting does hurt you if he does, so the best strategy will not necessarily be to always defect; it depends on the kind of uncertainty in the information. But the point is, defecting with probability 1/3+ε is not necessarily always best.
Maybe I’m confused here, but I thought that if two TDT agents play this game (or, for that matter, two UDT agents that can read each other’s source code), the only relevant payoffs are C/C = 2⁄2, D/D = 0⁄0, so Cooperate clearly dominates.
Remember, a UDT player that can predict the other’s decision algorithm will precommit to the dominant joint strategy (C/C or D/D) if the other player will respond accordingly.
A TDT player will do so even when the other player’s exact algorithm is opaque, because the other player is presumed to be making a decision with an analogous algorithm and the same information, so e.g. if you cooperate, there’s no reason not to assume the other player will make the same choice given the same incentives.
Which shows that defection was not the best strategy in this situation.
TDT can’t generally analyze two-player games, apart from assuming that the players are “sufficiently similar” and the game can thus be modeled by a causal graph with a single platonic agent-computation controlling both players.
Very raw comment follows, I hope others can make something of it.
In short, yeah. The best response to TDT isn’t necessarily TDT. So for TDT to constitute a “Nash equilibrium in algorithms” against itself, it needs to include precautions against exploiters, e.g. precommit to defecting against the defector in your game even though it looks suboptimal. We haven’t yet solved this problem in general. My current best idea (suggested by user Perplexed) is to use Nash’s 1953 paper to pick a fair outcome and disagreement point, then play accordingly: demand your fair share and punish at all costs if you don’t get it. In your game this would lead to the outcome 2⁄2. But this has the big drawback that “other players” are treated differently from static parts of the world-program. It may be fixable but I don’t yet know how.
Also Wei Dai’s workshop post on Jun 15 about “the stupid winner paradox” seems to be relevant, but I can’t copy-paste the whole resulting email exchange here. Maybe ask Wei to repost it to LW?
This is a “Chicken” game.