Okay, so this is what happens with the PD strategy in this comment.
Let’s try to get an optimal counter-strategy (CS) to the probabilistic strategy above (PS). We work backwards. Suppose we’ve worked out CS’s behavior for the last N-1 turns. Then on the Nth turn, in each of the four possible situations, the probabilities above, and what we’ve found for CS’s behavior, can be used to get us expected payouts for the remainder of the match if we cooperate and if we defect. We choose the action that yields the larger expected payout. This is the optimal strategy to use against this opponent if we want to get a high score.
Note that since PS is stupid and does the same thing on every turn, CS should just defect on the last turn.
However, after working out the math, it appears that CS is actually a very nice strategy. It defects on the last turn, and also on the next-to-last turn if it finds itself in a “CC” situation; in all other cases, it cooperates.
It’s obvious that PS, which has some probability of defecting, will win the match against CS, because it’s effectively playing against a cooperative rock. In other words, if you play against this strategy and try to maximize your own score, your opponent will have a higher score.
This isn’t as ridiculous as it appears! CS isn’t “losing” in any significant sense, because the goal we gave it wasn’t to win the match; it was to get as many points as possible. In an infinite Prisoner’s Dilemma (which is the situation considered in the paper), this is the only reasonable thing to ask, because there’s no match to be won. So the “extortion” of PS is actually that if you try to maximize your points against it, it will get even more points than you will.
Well, yes; I’m assuming that I know the strategy my opponent is playing, which assumes a precommitment. I’m just trying to explain the reasoning in the paper, without going into determinants and Markov chains and so on.
Okay, so this is what happens with the PD strategy in this comment.
Let’s try to get an optimal counter-strategy (CS) to the probabilistic strategy above (PS). We work backwards. Suppose we’ve worked out CS’s behavior for the last N-1 turns. Then on the Nth turn, in each of the four possible situations, the probabilities above, and what we’ve found for CS’s behavior, can be used to get us expected payouts for the remainder of the match if we cooperate and if we defect. We choose the action that yields the larger expected payout. This is the optimal strategy to use against this opponent if we want to get a high score.
Note that since PS is stupid and does the same thing on every turn, CS should just defect on the last turn.
However, after working out the math, it appears that CS is actually a very nice strategy. It defects on the last turn, and also on the next-to-last turn if it finds itself in a “CC” situation; in all other cases, it cooperates.
It’s obvious that PS, which has some probability of defecting, will win the match against CS, because it’s effectively playing against a cooperative rock. In other words, if you play against this strategy and try to maximize your own score, your opponent will have a higher score.
This isn’t as ridiculous as it appears! CS isn’t “losing” in any significant sense, because the goal we gave it wasn’t to win the match; it was to get as many points as possible. In an infinite Prisoner’s Dilemma (which is the situation considered in the paper), this is the only reasonable thing to ask, because there’s no match to be won. So the “extortion” of PS is actually that if you try to maximize your points against it, it will get even more points than you will.
Of course, the same as in a game of chicken where your opponent precommits to defecting.
In infinite IPD:
There are lots of probabilistic strategies your opponent can precommit to that prevent you from averaging CC (in this case: 3).
If your opponent accepts any probabilistic precommitment from you without precommiting himself, you can maximise your score beyond CC.
If you model your opponent as a probabilistic strategy, you accept any probabilistic precommitment from your opponent.
Point 2 may not be obvious, but follows straight from the payoff matrix.
Well, yes; I’m assuming that I know the strategy my opponent is playing, which assumes a precommitment. I’m just trying to explain the reasoning in the paper, without going into determinants and Markov chains and so on.