In August, I announced an iterated prisoner’s dilemma tournament in which bots can simulate each other before making a move. Eleven bots were submitted to the tournament. Today, I am pleased to announce the final standings and release the source code and full results.
All of the source code submitted by the competitors and the full results for each match are available here. See here for the full set of rules and tournament code.
Before we get to the final results, here’s a quick rundown of the bots that competed:
AnderBot
AnderBot follows a simple tit-for-tat-like algorithm that eschews simulation:
On the first turn, Cooperate.
For the next 10 turns, play tit-for-tat.
For the rest of the game, Defect with 10% probability or Defect if the opposing bot has defected more times than AnderBot.
CheeseBot
CheeseBot tries several simulation-based strategies, and uses the first one that applies to the current situation.
If the opponent defected on the previous round, Defect.
If the opponent does not defect against defectBot, Defect.
If defecting on this round would lead to CheeseBot being punished for it in future rounds, Cooperate. CheeseBot checks this by simulating future rounds with a simulated history in which it defects on the current round.
If the opponent is a mirror-like bot, Cooperate. To test whether a bot is mirror-like, CheeseBot simulates the opponent and checks if it defects against DefectBot and cooperates with a bot that plays tit-for-tat but defects against CooperateBot and DefectBot.
DavidMonRoBot (DMRB) takes a more cautious approach to simulating: It spends a few hundred milliseconds simulating its opponent to figure out what the rest of the round will look like given a Cooperate or Defect on the current round, and then picks the outcome that leads to the highest total number of points for DMRB.
This allows DMRB to gauge whether its opponent is “dumb,” i.e. does not punish defectors. If the opponent is dumb, DMRB reasons that the best move is to defect; otherwise, if DMRB thinks that its opponent will punish defection, it simply plays tit-for-tat. DMRB spends only a small amount of time simulating so that other simulation-based bots will be less likely to have their simulations time out while simulating DMRB.
Pip
This one is a behemoth. At almost 500 very dense lines of Haskell (including comments containing quotes from “The Quantum Thief”), Pip uses a complex series of simulations to classify the opponent into a large set of defined behaviors, such as “CooperateOnLastInTheFaceOfCooperation”, “Uncompromising” and “Extortionist.” Then, it builds out a decision tree and selects the outcome that leads to the highest score at the end of the match. If I’m being vague here, it’s because the inner workings of Pip are still mostly a mystery to me.
SimHatingTitForTat
SimHatingTitForTat, as the name implies, plays tit-for-tat and attempts to punish bots that use simulation to exploit it, namely by defecting against bots that deviated from tit-for-tat on any previous round. Its strategy is as follows:
On the first round, Cooperate.
On every subsequent round, if the opponent played tit-for-tat on all previous rounds, play tit-for-tat. Otherwise, Defect.
SimpleTFTseerBot
SimpleTFTseer uses a modified version of tit-for-tat that ruthlessly punishes defectors and uses one simulation per round to look at what its opponent will do on the last round.
If either player defected in a previous round, Defect.
If it is the last round, Cooperate.
Otherwise, simulate my opponent playing against me on the last round of this match, assuming that we both Cooperate from the current round until the second-to-last round, and do whatever my opponent does in that scenario. If the simulation does not terminate, Defect.
SwitchBot
SwitchBot also uses a modified tit-for-tat algorithm:
On the first turn, Cooperate.
On the second turn, if my opponent Defected on the previous turn, simulate my opponent playing against mirrorBot, and do whatever my opponent would do in that scenario.
Otherwise, play tit-for-tat.
TatForTit
TatForTit follows a complex simulation-based strategy:
On the first round, do whatever the opponent would do against TatForTit on the second round, assuming both bots cooperated on the first round.
On the second round, if my opponent defected on the previous round, Defect. Otherwise, do whatever my opponent would do against me, assuming they cooperated on the first round.
On all subsequent turns, if my previous move was not the same as my opponents move two turns ago, Defect. Otherwise, do whatever my opponent would do against me one turn in the future, assuming TatForTit repeats its previous move on the next turn and the opposing bot cooperated on the next turn.
TwoFacedTit
TwoFacedTit simulates its opponent playing against mirrorBot; if it takes more than 10 milliseconds to respond, TwoFacedTit plays Cooperate. Otherwise, TwoFacedTit plays tit-for-tat and then Defects on the last round.
VOFB
Like SimpleTFTseerBot, VeryOpportunisticFarseerBot (VOFB) uses a very aggressive defection-punishment strategy: If either player defected in a previous round, Defect. Otherwise, VOFB simulates the next round to determine what the opponent will do given a Cooperate or Defect on the current round. If the opponent does not punish defection or the simulation does not terminate, VOFB Defects. On the final round, VOFB uses additional simulations to detect whether its opponent defects against backstabbers, and if not, plays Defect.
Tournament results
After 1000 round-robin elimination matches, the final standings are:
1st place (tied): CheeseBot and DavidMonRoBot 3rd place: VeryOpportunisticFarseerBot 4th place: TatForTit
The win frequencies for each bot:
If it were a sporting event, this tournament would not be particularly exciting to watch, as most games were nearly identical from start to finish. The graph below shows each bot’s frequency of surviving the first round-robin round:
In other words, the same half of the field consistently swept the first round; AnderBot, DefectBot, Pip, and SimpleTFTseer never survived to see a second round. In general, this is because high-scoring bots almost always cooperated with each other (with the occasional backstab at the end of the round), and defected against AnderBot, DefectBot, Pop, and SimpleTFTseerBot, as these bots either did not consistently retaliate when defected against or pre-emptively defected, triggering a chain of mutual defections. Interestingly, the bots that continued on to the next round did not do so by a large margin:
However, the variance in these scores was very low, primarily due to the repeated matchups of mostly-deterministic strategies consistently resulting in the same outcomes.
In addition, all of the matches progressed in one of the following 5 ways (hat tip lackofcheese):
(Only TatForTit and DMRB made it into the final round; both bots cooperated until the second-to-last turns of each matchup, where TatForTit played Defect while the DMRB played Cooperate, resulting in a TatForTit victory)
(Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
This suggests that this game does have some kind of equilibrium, because these top three bots use very similar strategies: Simulate my opponent and figure out if defections will be punished; if so, Cooperate, and otherwise, defect. This allows bots following this strategy to always cooperate with each other, consistently providing them with a large number of points in every round, ensuring that they outcompete backstabbing or other aggressive strategies. In this tournament, this allowed the top three bots to add a guaranteed 600 points per round, more than enough to consistently keep them from being eliminated.
The tournament was slightly more interesting (and far more varied) on a matchup-by-matchup basis. Last-round and second-to-last round defections after mutual cooperation were common. TatForTit frequently used this technique against VOFB, CheeseBot, DMRB, and vice versa; this tactic allowed it to steal 32 wins in the final round. Other bots, particularly AnderBot and Pip, behaved very differently between matches. Pip, in particular, sometimes cooperated and sometimes defected for long stretches, and AnderBot’s randomness also led to erratic behavior. Ultimately, though, this did not net these bots a large number of points, as their opponents generally defected as soon as they stopped cooperating.
For those interested in the gritty details, I’ve formatted the output of each match to be human-readable, so you can easily read through the play-by-play of each match (and hopefully get some enjoyment out of it, as well). See the github repo for the full logs.
Postscript
In the course of running the tournament, I received a number of suggestions and idea for how things could be improved; some of these ideas include:
Random-length matches instead of fixed-length matches.
Straight elimination rather than round-robin elimination.
More “cannon-fodder” bots included in the tournament by default, such as copies of cooperateBot, defectBot, and tit-for-tat.
A QuickCheck-based test suite that allows bot writers to more easily test properties of their bot while developing, such as “cooperates with cooperateBot” or “cooperates if no one has defected so far.”
If anyone would like me to run this tournament again at some unspecified time in the future, with or without modifications, feel free to let me know in the comments. If you would like to fork the project and run it on your own, you are more than welcome to do so.
2014 iterated prisoner’s dilemma tournament results
Followup to: Announcing the 2014 program equilibrium iterated PD tournament
In August, I announced an iterated prisoner’s dilemma tournament in which bots can simulate each other before making a move. Eleven bots were submitted to the tournament. Today, I am pleased to announce the final standings and release the source code and full results.
All of the source code submitted by the competitors and the full results for each match are available here. See here for the full set of rules and tournament code.
Before we get to the final results, here’s a quick rundown of the bots that competed:
AnderBot
AnderBot follows a simple tit-for-tat-like algorithm that eschews simulation:
On the first turn, Cooperate.
For the next 10 turns, play tit-for-tat.
For the rest of the game, Defect with 10% probability or Defect if the opposing bot has defected more times than AnderBot.
CheeseBot
CheeseBot tries several simulation-based strategies, and uses the first one that applies to the current situation.
If the opponent defected on the previous round, Defect.
If the opponent does not defect against defectBot, Defect.
If defecting on this round would lead to CheeseBot being punished for it in future rounds, Cooperate. CheeseBot checks this by simulating future rounds with a simulated history in which it defects on the current round.
If the opponent is a mirror-like bot, Cooperate. To test whether a bot is mirror-like, CheeseBot simulates the opponent and checks if it defects against DefectBot and cooperates with a bot that plays tit-for-tat but defects against CooperateBot and DefectBot.
If it is the last round, Cooperate.
Defect.
DefectBot
Defect!
This bot was submitted publicly by James Miller.
DMRB
DavidMonRoBot (DMRB) takes a more cautious approach to simulating: It spends a few hundred milliseconds simulating its opponent to figure out what the rest of the round will look like given a Cooperate or Defect on the current round, and then picks the outcome that leads to the highest total number of points for DMRB.
This allows DMRB to gauge whether its opponent is “dumb,” i.e. does not punish defectors. If the opponent is dumb, DMRB reasons that the best move is to defect; otherwise, if DMRB thinks that its opponent will punish defection, it simply plays tit-for-tat. DMRB spends only a small amount of time simulating so that other simulation-based bots will be less likely to have their simulations time out while simulating DMRB.
Pip
This one is a behemoth. At almost 500 very dense lines of Haskell (including comments containing quotes from “The Quantum Thief”), Pip uses a complex series of simulations to classify the opponent into a large set of defined behaviors, such as “CooperateOnLastInTheFaceOfCooperation”, “Uncompromising” and “Extortionist.” Then, it builds out a decision tree and selects the outcome that leads to the highest score at the end of the match. If I’m being vague here, it’s because the inner workings of Pip are still mostly a mystery to me.
SimHatingTitForTat
SimHatingTitForTat, as the name implies, plays tit-for-tat and attempts to punish bots that use simulation to exploit it, namely by defecting against bots that deviated from tit-for-tat on any previous round. Its strategy is as follows:
On the first round, Cooperate.
On every subsequent round, if the opponent played tit-for-tat on all previous rounds, play tit-for-tat. Otherwise, Defect.
SimpleTFTseerBot
SimpleTFTseer uses a modified version of tit-for-tat that ruthlessly punishes defectors and uses one simulation per round to look at what its opponent will do on the last round.
If either player defected in a previous round, Defect.
If it is the last round, Cooperate.
Otherwise, simulate my opponent playing against me on the last round of this match, assuming that we both Cooperate from the current round until the second-to-last round, and do whatever my opponent does in that scenario. If the simulation does not terminate, Defect.
SwitchBot
SwitchBot also uses a modified tit-for-tat algorithm:
On the first turn, Cooperate.
On the second turn, if my opponent Defected on the previous turn, simulate my opponent playing against mirrorBot, and do whatever my opponent would do in that scenario.
Otherwise, play tit-for-tat.
TatForTit
TatForTit follows a complex simulation-based strategy:
On the first round, do whatever the opponent would do against TatForTit on the second round, assuming both bots cooperated on the first round.
On the second round, if my opponent defected on the previous round, Defect. Otherwise, do whatever my opponent would do against me, assuming they cooperated on the first round.
On all subsequent turns, if my previous move was not the same as my opponents move two turns ago, Defect. Otherwise, do whatever my opponent would do against me one turn in the future, assuming TatForTit repeats its previous move on the next turn and the opposing bot cooperated on the next turn.
TwoFacedTit
TwoFacedTit simulates its opponent playing against mirrorBot; if it takes more than 10 milliseconds to respond, TwoFacedTit plays Cooperate. Otherwise, TwoFacedTit plays tit-for-tat and then Defects on the last round.
VOFB
Like SimpleTFTseerBot, VeryOpportunisticFarseerBot (VOFB) uses a very aggressive defection-punishment strategy: If either player defected in a previous round, Defect. Otherwise, VOFB simulates the next round to determine what the opponent will do given a Cooperate or Defect on the current round. If the opponent does not punish defection or the simulation does not terminate, VOFB Defects. On the final round, VOFB uses additional simulations to detect whether its opponent defects against backstabbers, and if not, plays Defect.
Tournament results
After 1000 round-robin elimination matches, the final standings are:
1st place (tied): CheeseBot and DavidMonRoBot
3rd place: VeryOpportunisticFarseerBot
4th place: TatForTit
The win frequencies for each bot:
If it were a sporting event, this tournament would not be particularly exciting to watch, as most games were nearly identical from start to finish. The graph below shows each bot’s frequency of surviving the first round-robin round:
In other words, the same half of the field consistently swept the first round; AnderBot, DefectBot, Pip, and SimpleTFTseer never survived to see a second round. In general, this is because high-scoring bots almost always cooperated with each other (with the occasional backstab at the end of the round), and defected against AnderBot, DefectBot, Pop, and SimpleTFTseerBot, as these bots either did not consistently retaliate when defected against or pre-emptively defected, triggering a chain of mutual defections. Interestingly, the bots that continued on to the next round did not do so by a large margin:
However, the variance in these scores was very low, primarily due to the repeated matchups of mostly-deterministic strategies consistently resulting in the same outcomes.
In addition, all of the matches progressed in one of the following 5 ways (hat tip lackofcheese):
ALL->[Cheese,DMRB,SimHatingTFT,Switch,TwoFacedTit,VOFB]->[Cheese,DMRB,VOFB] (963 matches)
(Only CheeseBot, DMRB, and VOFB made it into the final round; all three cooperated with each other for the entire round, resulting in a three-way tie)
ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,TwoFacedTit]->[Cheese,DMRB,TatForTit]->[DMRB,TatForTit]->[TatForTit] (32 matches)
(Only TatForTit and DMRB made it into the final round; both bots cooperated until the second-to-last turns of each matchup, where TatForTit played Defect while the DMRB played Cooperate, resulting in a TatForTit victory)
ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,TwoFacedTit,VOFB]->[Cheese,DMRB,SimHatingTFT,TwoFacedTit]->[Cheese,DMRB] (3 matches)
(Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,VOFB]->[Cheese,DMRB,SimHatingTFT]->[Cheese,DMRB] (1 match)
(Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
ALL->[Cheese,DMRB,SimHatingTFT,Switch,TwoFacedTit,VOFB]->[Cheese,DMRB,SimHatingTFT]->[Cheese,DMRB] (1 match)
(Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
This suggests that this game does have some kind of equilibrium, because these top three bots use very similar strategies: Simulate my opponent and figure out if defections will be punished; if so, Cooperate, and otherwise, defect. This allows bots following this strategy to always cooperate with each other, consistently providing them with a large number of points in every round, ensuring that they outcompete backstabbing or other aggressive strategies. In this tournament, this allowed the top three bots to add a guaranteed 600 points per round, more than enough to consistently keep them from being eliminated.
The tournament was slightly more interesting (and far more varied) on a matchup-by-matchup basis. Last-round and second-to-last round defections after mutual cooperation were common. TatForTit frequently used this technique against VOFB, CheeseBot, DMRB, and vice versa; this tactic allowed it to steal 32 wins in the final round. Other bots, particularly AnderBot and Pip, behaved very differently between matches. Pip, in particular, sometimes cooperated and sometimes defected for long stretches, and AnderBot’s randomness also led to erratic behavior. Ultimately, though, this did not net these bots a large number of points, as their opponents generally defected as soon as they stopped cooperating.
For those interested in the gritty details, I’ve formatted the output of each match to be human-readable, so you can easily read through the play-by-play of each match (and hopefully get some enjoyment out of it, as well). See the github repo for the full logs.
Postscript
In the course of running the tournament, I received a number of suggestions and idea for how things could be improved; some of these ideas include:
Random-length matches instead of fixed-length matches.
Straight elimination rather than round-robin elimination.
More “cannon-fodder” bots included in the tournament by default, such as copies of cooperateBot, defectBot, and tit-for-tat.
A QuickCheck-based test suite that allows bot writers to more easily test properties of their bot while developing, such as “cooperates with cooperateBot” or “cooperates if no one has defected so far.”
If anyone would like me to run this tournament again at some unspecified time in the future, with or without modifications, feel free to let me know in the comments. If you would like to fork the project and run it on your own, you are more than welcome to do so.
Many thanks to everyone who participated!