lackofcheese comments on 2014 iterated prisoner’s dilemma tournament results

lackofcheese 28 Sep 2014 17:29 UTC
12 points
Since people seem to be unsure about this, I figured it would be good to make it clear that there are many simple strategies that are provably symmetric Nash Equilibria and cooperate 1v1 vs relatively broad varieties of opponents (unlike a CliqueBot in the non-iterated source-sharing version). Here’s one example:

If either you or your opponent has ever defected in the past, Defect.
Otherwise, if it’s the 98th round and you simulate that your opponent will defect on the 99th round if you both cooperate on the 98th round, or if you simulate that your opponent will defect on the 100th round if you both cooperate on the 98th and 99th rounds, or if your simulation attempt times out, Defect.
Otherwise, Cooperate.

When playing against this bot, if you defect on the 98th round or earlier, you will be defected against for the rest of the game, and mutual defection loses you at least 4 points which outweighs your 2-point gain for DC. If you cooperate for 98 rounds and then defect on the 99th, 100th, or both, you will be defected against for the last three rounds and hence you will lose points.

Consequently, the only way to maximise your reward vs this bot is to Cooperate on every round, which gets you 300 points, so this bot is unexploitable in the same way that PrudentBot in the non-iterated source-sharing PD. Also, this bot clearly cooperates for all 100 rounds if it plays against itself.

The basic starting point for this type of bot is to punish defections in past rounds via simple Tit-for-Tat-like methods. This only leaves one basic flaw—Tit-for-Tat can be exploited via defection on the last round. The solution to this problem is to patch this hole up by simulating future rounds (with the same matchup) and punishing them in advance. Additionally, in any matchup between two bots that use this general principle, the recursive simulation process is guaranteed to eventually hit a base case.

I think there are quite a large number of distinguishable cooperative symmetric equilibrium strategies, but I think they would all follow this general principle of punishing past defections as well as defections detected in simulated future rounds.

I didn’t go for something quite this simple because the tournament setup means the point of the game is not simply to maximise score, and because I wanted to perform optimally vs some simple bots like DefectBot, CooperateBot and RandomBot.