I’ve written a pretty good program to complete variant 3 but I’m such a lurker on this site i lack the nessesary karma to submit it as a article. So here it is as a comment instead:
Inspired by prase’s contest, (results here) and Eliezer_Yudkowsky’s comment, I’ve written a program that plays iterated games of prisoners dilemma, with the cravat that each player can simulate his opponent. I’m now running my own contest. You can enter your onwn strategy by sending it to me in a message. Deadline to enter is Sunday September 25. The contest itself will run shortly there after.
Each strategy plays 100 iterations against each other strategy, with cumulative points. Each specific iteration is called a turn, each set of 100 iterations is called a match. (thanks prase for the terminology). Scoring is CC: (4,4) DC: (7,0) CD: (0,7) DD: (2,2).
Every turn, each strategy receives a record of all the moves it and its opponent made this match, a lump of time to work and its opponents code. Strategies can’t review enemy code directly but they can run it through any simulations they want before deciding their move, within the limits of the amount of time they have to work.
Note that strategies can not pass information to themselves between iterations or otherwise store it, other than the record of decisions. They start each turn anew. This way any strategy can be simulated with any arbitrary record of moves without running into issue.
Strategies in simulations need a enemy strategy passed into them. To avoid infinite recursion of simulations, they are forbidden from passing themselves. They can have as many secondary strategies as need however. This creates a string of simulations where strategy(1) simulates enemy(1), passing in strategy(2) which enemy(1) simulates and passes in enemy(2) which strategy(2) simulates and passes in strategy(3). The final sub-strategy passed in by such a chain must be a strategy which performs NO simulations.
for example, the first sub-strategy could be an exact duplicate of main strategy other than that it passed the tit_for_tat program instead. so main_strategy ⇒ sub_strategy1 ⇒ tit_for_tat.
You can of course use as many different sub_strategies as you need, the programs are limited by processing time, not memory. Strategies can run their simulated opponent on any history they devise, playing which ever side they choose.
Strategies can’t read the name of their opponent, see the number of strategies in the game, watch any other matches or see any scores outside their current match.
Strategies are not cut off if they run out of time, but both will receive 0 points for the turn. The decisions of the turn will be recorded as if normal.
I never figured out a good way to keep strategies from realizing they were being simulated simply by looking at how much time they were given. Not knowing how much time they have would make it prohibitively difficult to avoid timing out. My hack solution is to not give a definitive amount of time for the main contest but instead a range: from 0.01 seconds to 0.1 seconds per turn, with the true time to be know only by me. This is far from ideal, and if anyone has a better suggestion I’m all ears.
To give reference: a strategy that runs 2 single turn simulations of tit_for_tat against tit_for_tat takes, on average 3.510^-4 seconds. Running only 1 single turn simulations took only 1.510^-4 seconds. tit_for_tat by itself takes about 2.5*10^-5 seconds to run a single turn. Unfortunately, do to factor outside of my control, matlab, the software I’m running will for unknown reasons take 3 to 5 times as long as normal about 1 out of a 100 times. Leave yourself some buffer.
Strategies are NOT allowed a random number generator. This is different from prase’s contest but I would like to see strategies for dealing with enemy intelligence trying to figure them out without resorting to unpredictability.
I’ve come up with a couple of simple example strategies that will be performing in the contest.
Careful_cheater:
simulates its opponent against tit_for_tat to see its next move. If enemy defects, Careful_cheater defects. If enemy cooperates, Careful_cheater simulates its next move after that with a record showing Careful_cheater defecting this turn, passing tit_for_tat. If the opponent would have defected on the next turn, Careful_cheater cooperates, but if the opponent would have cooperated despite Careful_cheaters defection, it goes ahead and defects.
simulations show it doing evenly against tit_for_tat and its usual variations, but Careful_cheater eats forgiving programs like tit_for_2_tats alive.
Everyman:
simulates 10 turn matchs with its opponent against several possible basic strategies, including tit_for_tat, tit_for_tat_optimist (cooperates first two turns), and several others, then compares the scores of each match and plays as the highest scoring
Copy_cat:
Switches player numbers and simulates its opponent playing Copy_cats record while passing its opponent and then performs that action. That is to say, it sees what its opponent would do if it were in Copy_cats position and up against itself.
this is basically DuncanS strategy from the comments. DuncanS, your free to enter another one sense I stole yours as an example
and of course tit_for_tat and strategy “I” from prase’s contest will be competing as well.
one strategy per person, all you need for a strategy is a complete written description, though I may message back for clarification. I reserve the right to veto any strategy I deem prohibitively difficult to implement.
So what happens if copy_cat goes actually does go up against itself, or a semantically similar bot? Infinite recursion and zero points for both? Or am I missing something.
Actually, it seems that crashes it. thanks for the catch. I hadn’t tested copy_cat against itself and should have. Forbidding strategies to pass their opponent fixes that, but it does indicate my program may not be as stable as I thought. I’m going to have to spend a few more days checking for bugs sense I missed one that big. thanks eugman.
No problem. I’ve been thinking about it and one should be able to recursively prove if a strategy bottoms’s out. But that should fix it as long the user passes terminating strategies. A warning about everyman, if it goes against itself, it will run n^2 matches where n is the number of strategies tested.Time consumed is an interesting issue because if you take a long time it’s like being a defectbot against smartbots, you punish bots that try to simulate you but you lose points too.
I didn’t get into it earlier, but everyman’s a little more complicated. it runs through each match test one at a time from most likely to least, and checks after each match how much time it has left, if it starts to run out, it just leaves with the best strategy its figured out. By controlling how much time you pass in, strategies can avoid n^2 processing problems. Its why I thought it was so necessary to include even if it does give hints if strategies are being simulated or not. Everyman was built as a sort of upper bound. its one of the least efficient strategies one might implement.
Here’s a very fancy cliquebot (with a couple of other characteristics) that I came up with.
The bot is in one of 4 “modes”—SELF, HOSTILE, ALLY, PASSWORD.
Before turn1:
Simulate the enemy for 20 turns against DCCCDDCDDD Cs thereafter, If the enemy responds 10Ds, CCDCDDDCDC then change to mode SELF. (These are pretty much random strings—the only requirement is that the first begins with D)
Simulate the enemy for 10 turns against DefectBot. If the enemy cooperates in all 10 turns, change to mode HOSTILE.
Else be in mode ALLY.
In any given turn,
If in mode SELF, cooperate always.
If in mode HOSTILE, defect always.
If in mode ALLY, simulate the enemy against TFT for the next 2 turns. If the enemy defects on either of these turns, or defected on the last turn, switch to mode HOSTILE and defect. Exception if it is move 1 and the enemy will DC, then switch to mode PASSWORD and defect. Defect on the last move.
If in mode PASSWORD, change to mode HOSTILE if the enemy’s moves have varied from DCCCDDCDDD Cs thereafter beginning from move 1. If so, switch to mode HOSTILE and defect. Else defect on moves 1-10, on moves 11-20 CCDCDDDCDC respectively and defect thereafter.
I’ve written a pretty good program to complete variant 3 but I’m such a lurker on this site i lack the nessesary karma to submit it as a article. So here it is as a comment instead:
Inspired by prase’s contest, (results here) and Eliezer_Yudkowsky’s comment, I’ve written a program that plays iterated games of prisoners dilemma, with the cravat that each player can simulate his opponent. I’m now running my own contest. You can enter your onwn strategy by sending it to me in a message. Deadline to enter is Sunday September 25. The contest itself will run shortly there after.
Each strategy plays 100 iterations against each other strategy, with cumulative points. Each specific iteration is called a turn, each set of 100 iterations is called a match. (thanks prase for the terminology). Scoring is CC: (4,4) DC: (7,0) CD: (0,7) DD: (2,2).
Every turn, each strategy receives a record of all the moves it and its opponent made this match, a lump of time to work and its opponents code. Strategies can’t review enemy code directly but they can run it through any simulations they want before deciding their move, within the limits of the amount of time they have to work.
Note that strategies can not pass information to themselves between iterations or otherwise store it, other than the record of decisions. They start each turn anew. This way any strategy can be simulated with any arbitrary record of moves without running into issue.
Strategies in simulations need a enemy strategy passed into them. To avoid infinite recursion of simulations, they are forbidden from passing themselves. They can have as many secondary strategies as need however. This creates a string of simulations where strategy(1) simulates enemy(1), passing in strategy(2) which enemy(1) simulates and passes in enemy(2) which strategy(2) simulates and passes in strategy(3). The final sub-strategy passed in by such a chain must be a strategy which performs NO simulations.
for example, the first sub-strategy could be an exact duplicate of main strategy other than that it passed the tit_for_tat program instead. so main_strategy ⇒ sub_strategy1 ⇒ tit_for_tat.
You can of course use as many different sub_strategies as you need, the programs are limited by processing time, not memory. Strategies can run their simulated opponent on any history they devise, playing which ever side they choose.
Strategies can’t read the name of their opponent, see the number of strategies in the game, watch any other matches or see any scores outside their current match.
Strategies are not cut off if they run out of time, but both will receive 0 points for the turn. The decisions of the turn will be recorded as if normal.
I never figured out a good way to keep strategies from realizing they were being simulated simply by looking at how much time they were given. Not knowing how much time they have would make it prohibitively difficult to avoid timing out. My hack solution is to not give a definitive amount of time for the main contest but instead a range: from 0.01 seconds to 0.1 seconds per turn, with the true time to be know only by me. This is far from ideal, and if anyone has a better suggestion I’m all ears.
To give reference: a strategy that runs 2 single turn simulations of tit_for_tat against tit_for_tat takes, on average 3.510^-4 seconds. Running only 1 single turn simulations took only 1.510^-4 seconds. tit_for_tat by itself takes about 2.5*10^-5 seconds to run a single turn. Unfortunately, do to factor outside of my control, matlab, the software I’m running will for unknown reasons take 3 to 5 times as long as normal about 1 out of a 100 times. Leave yourself some buffer.
Strategies are NOT allowed a random number generator. This is different from prase’s contest but I would like to see strategies for dealing with enemy intelligence trying to figure them out without resorting to unpredictability.
I’ve come up with a couple of simple example strategies that will be performing in the contest.
Careful_cheater:
simulates its opponent against tit_for_tat to see its next move. If enemy defects, Careful_cheater defects. If enemy cooperates, Careful_cheater simulates its next move after that with a record showing Careful_cheater defecting this turn, passing tit_for_tat. If the opponent would have defected on the next turn, Careful_cheater cooperates, but if the opponent would have cooperated despite Careful_cheaters defection, it goes ahead and defects.
simulations show it doing evenly against tit_for_tat and its usual variations, but Careful_cheater eats forgiving programs like tit_for_2_tats alive.
Everyman:
simulates 10 turn matchs with its opponent against several possible basic strategies, including tit_for_tat, tit_for_tat_optimist (cooperates first two turns), and several others, then compares the scores of each match and plays as the highest scoring
Copy_cat:
Switches player numbers and simulates its opponent playing Copy_cats record while passing its opponent and then performs that action. That is to say, it sees what its opponent would do if it were in Copy_cats position and up against itself.
this is basically DuncanS strategy from the comments. DuncanS, your free to enter another one sense I stole yours as an example
and of course tit_for_tat and strategy “I” from prase’s contest will be competing as well.
one strategy per person, all you need for a strategy is a complete written description, though I may message back for clarification. I reserve the right to veto any strategy I deem prohibitively difficult to implement.
So what happens if copy_cat goes actually does go up against itself, or a semantically similar bot? Infinite recursion and zero points for both? Or am I missing something.
Actually, it seems that crashes it. thanks for the catch. I hadn’t tested copy_cat against itself and should have. Forbidding strategies to pass their opponent fixes that, but it does indicate my program may not be as stable as I thought. I’m going to have to spend a few more days checking for bugs sense I missed one that big. thanks eugman.
No problem. I’ve been thinking about it and one should be able to recursively prove if a strategy bottoms’s out. But that should fix it as long the user passes terminating strategies. A warning about everyman, if it goes against itself, it will run n^2 matches where n is the number of strategies tested.Time consumed is an interesting issue because if you take a long time it’s like being a defectbot against smartbots, you punish bots that try to simulate you but you lose points too.
I didn’t get into it earlier, but everyman’s a little more complicated. it runs through each match test one at a time from most likely to least, and checks after each match how much time it has left, if it starts to run out, it just leaves with the best strategy its figured out. By controlling how much time you pass in, strategies can avoid n^2 processing problems. Its why I thought it was so necessary to include even if it does give hints if strategies are being simulated or not. Everyman was built as a sort of upper bound. its one of the least efficient strategies one might implement.
Here’s a very fancy cliquebot (with a couple of other characteristics) that I came up with. The bot is in one of 4 “modes”—SELF, HOSTILE, ALLY, PASSWORD.
Before turn1:
Simulate the enemy for 20 turns against DCCCDDCDDD Cs thereafter, If the enemy responds 10Ds, CCDCDDDCDC then change to mode SELF. (These are pretty much random strings—the only requirement is that the first begins with D)
Simulate the enemy for 10 turns against DefectBot. If the enemy cooperates in all 10 turns, change to mode HOSTILE. Else be in mode ALLY.
In any given turn,
If in mode SELF, cooperate always.
If in mode HOSTILE, defect always.
If in mode ALLY, simulate the enemy against TFT for the next 2 turns. If the enemy defects on either of these turns, or defected on the last turn, switch to mode HOSTILE and defect. Exception if it is move 1 and the enemy will DC, then switch to mode PASSWORD and defect. Defect on the last move.
If in mode PASSWORD, change to mode HOSTILE if the enemy’s moves have varied from DCCCDDCDDD Cs thereafter beginning from move 1. If so, switch to mode HOSTILE and defect. Else defect on moves 1-10, on moves 11-20 CCDCDDDCDC respectively and defect thereafter.
Essentially designed to dominate in the endgame.