I didn’t find this paper particularly interesting, mostly because it doesn’t show the strength of extortionate strategies, but rather the limits of evolution in the way the paper defines it, and because these kind of “evolutionary” strategies have never been empirically shown to be particularly successful in IPD matches of infinite length, so their exploitation is not a “significant mathematical feature” as claimed.
To sum up the paper: In a non-zero-sum game of this kind, strategies that only care about gradually improving their own score cannot fully utilize defection as a punishment or deterrent, and thus are permanently exploited by strategies that can.
Note that the kind of evolution the paper talks about has little to do with actual evolution, and the last paragraph is nothing but an empty phrase.
I’m not really able to evaluate the claims in the paper myself, so thanks for the input. Having said that, do you think the paper specifies the strategies with enough detail for us to code them up and test their mettle in a Less Wrong IPD tournament?
Sure, but this extortionate strategy wouldn’t survive long in such a tournament because it would perform poorly against TFT variants as well as against itself.
I know that this isn’t exactly what you’re asking, but: Stewart and Plotkin tested two variants of ZD strategies in a variant of Axelrod’s original tournament; one variant (ZDGTFT-2) had the highest total score, beating TFT.
Here’s another strategy computed from the paper: cooperate with probabilities (0.9, 0.7, 0.2, 0.1) in the case (CC, CD, DC, DD). Supposedly this strategy is one that sets the opponent’s score to an average of 2, regardless of his or her actions. You could come up with a similar strategy to force any outcome in the interval (1,3), excluding the endpoints.
Alright, thank you. As far as the last paragraph goes, I took it of course more on the “metaphorical” level. I agree their evolutionary agent might be too restricted to be fully interesting (though it is valuable if their inferiority is demonstrated analytically not only from simulations).
Since it seems you have lot’s of experience with IPD, what do you think about the case B)? The paper makes the claim specifically for the ZD strategies, but do you think this “superrationally” result could generalize to any strategy which has also a theory of mind? On the other hand Hofstadter’s idea was in the context of one-shot PD, so this might be not apply in general at all… I need to learn more about this subject...
I didn’t find this paper particularly interesting, mostly because it doesn’t show the strength of extortionate strategies, but rather the limits of evolution in the way the paper defines it, and because these kind of “evolutionary” strategies have never been empirically shown to be particularly successful in IPD matches of infinite length, so their exploitation is not a “significant mathematical feature” as claimed.
To sum up the paper: In a non-zero-sum game of this kind, strategies that only care about gradually improving their own score cannot fully utilize defection as a punishment or deterrent, and thus are permanently exploited by strategies that can.
Note that the kind of evolution the paper talks about has little to do with actual evolution, and the last paragraph is nothing but an empty phrase.
I’m not really able to evaluate the claims in the paper myself, so thanks for the input. Having said that, do you think the paper specifies the strategies with enough detail for us to code them up and test their mettle in a Less Wrong IPD tournament?
Here’s the first strategy that is explicitly stated (in the context of the 5/3/1/0 payouts):
If the last outcome was CC, cooperate with probability 11⁄13.
If the last outcome was CD, cooperate with probability 1⁄2.
If the last outcome was DC, cooperate with probability 7⁄26.
If the last outcome was DD, defect.
This supposedly “enforces an extortion factor of 3”, whatever that means.
Sure, but this extortionate strategy wouldn’t survive long in such a tournament because it would perform poorly against TFT variants as well as against itself.
I know that this isn’t exactly what you’re asking, but: Stewart and Plotkin tested two variants of ZD strategies in a variant of Axelrod’s original tournament; one variant (ZDGTFT-2) had the highest total score, beating TFT.
Here’s another strategy computed from the paper: cooperate with probabilities (0.9, 0.7, 0.2, 0.1) in the case (CC, CD, DC, DD). Supposedly this strategy is one that sets the opponent’s score to an average of 2, regardless of his or her actions. You could come up with a similar strategy to force any outcome in the interval (1,3), excluding the endpoints.
I’ve yet to check how this works.
I am now reading the paper with intent to code them into the current simulator. Wish me luck; reading Dyson is a challenge.
Alright, thank you. As far as the last paragraph goes, I took it of course more on the “metaphorical” level. I agree their evolutionary agent might be too restricted to be fully interesting (though it is valuable if their inferiority is demonstrated analytically not only from simulations).
Since it seems you have lot’s of experience with IPD, what do you think about the case B)? The paper makes the claim specifically for the ZD strategies, but do you think this “superrationally” result could generalize to any strategy which has also a theory of mind? On the other hand Hofstadter’s idea was in the context of one-shot PD, so this might be not apply in general at all… I need to learn more about this subject...