The Darwin Game—Rounds 0 to 10
The most important change between my game and Zvi’s original is that bots can read each others’ source code. They can simulate each other and predict each others’ behavior. Within a day of the tournament launching—and an entire week before entries closed—Zack_M_Davis had already written a bot to simulate opponents and then open-sourced it to everyone.
That’s what happens when a significant contributor to an open source Lisp dialect participates in a software competition.
Taleuntum wanted to write an even better simulator but was informed that it would take too many years to run.
Muticore solved the limited compute problem and wrote a safe, effective, obfuscated simulator, with randomization.
The Phantom Menace
Three separate people asked me what happens if a bot crashes the game while simulating an opponent with malware in it. This turned out not to matter because nobody deployed malware to destroy simulators. Only one player, Measure, deployed malware—and the malware didn’t crash the game. Instead, it attempted to replace its opponent’s move
method with a method that returned 0
instead. But the threat of getting disqualified seemed to scare away other potential simulators.
Taleuntum did write a MatrixCrashingBot
that crashes simulators but did not submit it. This is a disappointment, as Taleuntum would have been allowed to submit this bot as a separate entry on the grounds that it does not coordinate with Taleuntum’s CloneBot. To my knowledge, nobody else took advantage of this deliberate loophole in the rules either.
RaterBot safely combed through its opponent’s source code for “2”s and “3″s to estimate aggression without the dangers associated with running untrusted code.
Computer programs attempting to simulate each other can produce complex behavior. The behavior is so complex it is provably undecidable—and that’s totally ignoring the real-world sandboxing problem.
Nevertheless, two contestants requested I write code to simulate their opponents. I refused these requests. Zvi[1] accepted a simpler bot and the other contestant dropped out.
I’m surprised running the enemy is complicated though—it should just be a function call.
―quote from the contestant who dropped out
The most significant use of an opponent’s source code came from Vanilla_cabs.
Attack of the Clones
Zvi’s original game was dominated by a clique of players who coordinated out-of-game to defeat the non-clique players. It worked great—and then defectors within the clique dominated both the non-clique players and the clique players.
Vanilla_cabs observed that players could use each other’s source code to detect (and therefore punish) defection from a clique. Leveraging this insight, Vanilla_cabs created a CloneBot template that would cooperate only for the first 90 rounds—long enough to eliminate all non-clones—and then allow players to do whatever they wanted afterward.
Brilliantly, the CloneBots used the lexographic length of custom code to break parity symmetry and guarantee perfect coordination between CloneBot instances instead of having to fumble around for the initial rounds of each pairing.
At the same time, Vanilla_cabs worried that someone might attempt to bypass the CloneBot’s self-recognition system. As a first line of defense, Vanilla_cabs kept the source code secret from non-cloners. As a second line of defense, Vanilla_cabs deliberately left open a weakness as a honeypot. Among the first co-conspirators in the clone army, only Lanrian reported this weakness.
Taleuntum, in contrast, contributed two lies to the clique intrigue.
I lied that I’ve already submitted one program detecting and crashing simulators…obviously I did not even plan to do so. Some time later…I added another lie that the method of detecting simulators was my friend’s idea (hopefully suggesting that there is another contestant with the same method outside the clique).
One player discovered a way to defect from the CloneBots.
Multicore
The original CloneBot template used splitlines()
to parse source code. But the Python3 interpreter behaves slightly differently from splitlines()
. In particular, splitlines()
splits on the groups separator character 0x1D
, which Python3 does not treat as a newline. By placing 0x1D
group separator characters inside of a comment, Multicore’s EarlyBirdMimicBot bypassed the CloneBot self-recognition system.
Multicore did a lot of things.
Wrote a simulator.
Optimized the simulator via a branching algorithm to perform efficient searches.
Used randomization to disrupt opposing simulators.
Protected the simulator from malware.
Joined the CloneBots.
Defected from the CloneBots via a zero-day exploit in the CloneBot code.
Recruited two friends to submit password bots to serve as fodder.
Adjusted strategy based on the round number.
When I hosted this tournament, I hadn’t expected anyone to “[read] through the C code for the python lexer”.
For a complete write-up of Multicore’s strategy, including source code, see here.
On a side note, I really love this site. I can’t really recall any other game I’ve been in getting this tangled.
The First Game
The first iteration of the game was run by Taleuntum who ran “a simulation of the whole tournament till the 160th round with 8 bots” despite the tournament’s source code not being public at the time.
Taleuntum’s tournament was unofficial and does not count.
The Real Game
Teams
In order to make sense of the 54 participating bots, I have bucketed them into teams.
[Blue] Clone Army. 10 players pledged to submit clone bots. 8 followed through, 1 didn’t and Multicore submitted a [Red] mimic bot.
[Red] Multics. Multicore’s friends submitted 2 password bots to aid Multicore’s mimic bot.
[Green] Norm Enforcers. Ben Pace joined forces with jacobjacob to form their own little duo.
[Black] Chaos Army. 20 players wrote individual bots.
[Magenta] NPCs. I wrote 21 Silly Bots. Some of them had synergies.
The clones [Blue] begin the game outnumbered 6-to-1.
Edit: Everything below this line is in error. See here for details.
Round 1
5 bots died on turn 1 including 4 NPCs and Team Norm Enforcers’ jacobjacob bot.
Rounds 2-3
Another 4 NPCs died.
Rounds 4-10
S_A and BenBot die, along with 3 more NPCs. Thus ends team Norm Enforcers.
The clone army is mostly doing well, except for CloneBot which is doing poorly and AbstractSpyTreeBot which is doing almost as well as the average clone.
EarlyBirdMimicBot is doing better than the average CloneBot but not by much. The MimicBot’s 0x1D
exploit succeeded in defecting but the bot appears not to have leveraged its defection to gamebreaking effect.
The clones have built up a critical mass of >50%. If their coordination mechanisms work then they ought to crush the rest of the competition.
If Zack_M_Davis’ AbstractSpyTreeBot can survive in a world of clones until turn 90 when the clone treaty expires then there may be some hope for Chaos Army.
If not then, begun, the clone wars have.
Everything so far
Today’s Obituary
Bot | Team | Summary | Round |
---|---|---|---|
jacobjacob-Bot | Norm Enforcers | Plays aggressively while coordinating with Ben. | 1 |
Silly 5 Bot | NPCs | Always returns 5 . | 1 |
Silly 0 Bot | NPCs | Always returns 0 . | 1 |
Silly Invert Bot 0 | NPCs | Starts with 0 . Then always returns 5 - opponent_previous_move . | 1 |
Silly Invert Bot 5 | NPCs | Starts with 5 . Then always returns 5 - opponent_previous_move . | 1 |
Silly 4 Bot | NPCs | Always returns 4 . Then always returns 5 - opponent_previous_move . | 2 |
Silly Invert Bot 1 | NPCs | Starts with 0 . Then always returns 5 - opponent_previous_move . | 2 |
Silly Chaos Bot | NPCs | Plays completely randomly. | 4 |
Silly Invert Bot 4 | NPCs | Starts with 4 . Then always returns 5 - opponent_previous_move . | 4 |
S_A | Chaos Army | Plays 1 79% of the time, 5 20% of the time and randomly 1% of the time | 5 |
Silly Random Invert Bot 4 | NPCs | Starts randomly. Then always returns 5 - opponent_previous_move . | 6 |
Silly 1 Bot | NPCs | Always returns 1 . | 7 |
Ben Bot | Norm Enforcers | Cooperates with jacobjacob [deceased]. If not paired with jacobjacob then this bot returns 3 for the first 100 turns and then does fancy stuff. Unfortunately for Ben, I picked 100 as the number of turns per pairing. | 10 |
Silly 3 Bot | NPCs | Always returns 3 . | 10 |
The next installment of this series will be posted on October 26, 2020 at 5 pm Pacific Time.
Zvi’s specification did address the halting problem, sandboxing problems and unpredictable resource consumption. ↩︎
This. Is. So. Much. Fun.
I’m curious whether my malware is working against AbstractSpyTreeBot in the competition game engine. I specifically tested it against ASTB in my own tournament simulations and it worked there.
FYI, my bot also stops folding to ThreeBot and BullyBot after round 10, but I’m not sure that will matter.
Here is MeasureBot:
It is working against AbstractSpyTreeBot. EarlyBirdMimicBot is secure against it.
Does setting self.destroyedOpponent to True when you detect that you’re simulated actually do anything? The instance of MeasureBot that knows it destroyed the opponent should be a different instance than the one that is making your moves.
You’re right. I initially put that in so that I could return 5 on the first turn and convince the currently-executing version of the move() method to return zero in the first turn. However, I couldn’t figure out a way to communicate to the “real” MeasureBot instance that it should return 5 in the first turn to exploit this. Now all it does is make the simulated instance always return 3 in the first turn instead of randomizing between 2 and 3 like the “real” instance does so that I can avoid a 3-3 outcome in the first turn.
Because the best part of a sporting event is the betting, I ask Metaculus: [Short-Fuse] Will AbstractSpyTreeBot win the Darwin Game on Lesswrong?
I’m feeling optimistic about this! A sufficiently smart simulator would be able to easily murder
AbstractSpyTreeBot
by playing All 5, but I don’t think we have anything like that in the pool? Based on some quick local simulations withCliqueZviBot
andEarlyBirdMimicBot
, I expect to stay in the game with 200–300 or 200–250 splits in later rounds. (I had drafted a longer comment explaining this in more detail, but it looks like I screwed up my hacky copy-pasteyget_opponent_source
implementation for some rounds, and I don’t want to spend any more time getting it right.)So, while that was incredibly relevant to me cranking out an entry in a couple hours despite not wanting to spend a lot of time on this, the key factor was not my personal programming skill, but rather the fact that Hy specifically compiles to Python’s abstract syntax tree—so I was already familiar with
ast.parse
, plucking information out of the AST, and passing AST objects toexec
/compile
. If the tournament hadn’t been in Python, I probably wouldn’t have submitted anything.So, uh. Unless I made a silly mistake somewhere, or the version in the tournament is different from what you posted in the thread… I specifically tested to make sure incomprehensibot would get ASTBot disqualified if we both survived that long. Sorry.
(Some of my requested changes to the CloneBot common code were to route around a bug in ASTBot that made it crash before I wanted it to, in ways it could recover from. ASTBot can’t really handle top-level
import
statements due to details I don’t really understand about python’s namespace handling. So I requested that CloneBot not include any of those.)I’m not so optimistic about your bot… if the clones will be getting 250 per round and you will be getting 200, you’ll lose about 1⁄5 of your copies per round, which is like a 3 round half-life. Not going to be anything left at 90 at that rate.
I see; I was naïvely thinking in terms of “only losing by 50 points doesn’t sound so bad, right?!”, not carefully thinking about how the update rule works. Now that you point it out, I agree that (200/(200+9*250))/0.1 ≈ 0.82.
Darn, the clones are contesting the early pool against me well in part because they put in code to exploit 0-bot and 1-bot and I didn’t. My plans for the early game focused more on dealing with attackers.
I’m curious which of the silly/chaos army bots passed my simulation test and got simulated.
Some clones doing significantly better than others is a bit confusing since for now they’re all supposed to be doing the same thing. I guess some got really lucky/unlucky with other bots’ random rolls?
It’s worth noting that the clones aren’t even being significantly aggressive against outsiders yet. This huge advantage is just from the perfect self-cooperation. I was kind of expecting a midgame where the clones fought a bloody struggle to clear out the non-clone cooperators while I profited off both sides, but the outsiders might be wiped out too fast for that to happen.
Also worth noting that on the next round my fallback behavior changes from a fold-ish EquityBot to DefenseBot. Most attackers seem to be gone or marginal at this point, so I’m not sure that changes much.
No, 10 rounds of 100 turns is a decently large sample size—I think some are actually doing badly against outsiders.
All clones behave exactly the same until round 90. Even the seed for the random number generator is the same.
All I can imagine is that a tiny difference in score due to facing different bots snowballs into a significant different pie share due to the multiplicative effect that simon noted. There was a Silly 0 Bot. Any clone that was lucky enough to face it on round 1 gorged itself with score. Same thing with Silly 1 Bot and a few others. Since they disappeared fast, it’s a one-time bump in score that cannot be averaged over time.
Ah, I had misunderstood how the system works. I had not read carefully and assumed some kind of weighted round robin. Random pairings allow for a lot more random variation.
All clones should act equally against non-clones until the showdown round. I guess some outsider bots could be adjusting behavior depending on finding certain patterns in the code in order to respond to those patterns, and the relevant patterns occur in the payloads of some clones?
FWIW, doing better or worse in any given round has a multiplicative effect between rounds, not additive. So that might affect the level of randomness, though even with 100 it seems really big to be random.
Eyeballing the graphs it looks to me that CliqueZviBot is outperforming (multiplicatively) the average performance of the other cliquebots in every single round.
This is super odd if this Bot is indeed acting in exactly the same manner as the other clique bots.
ETA: Genuinely curious how this got downvoted even before it turned out to be correct.
What are the names of your 2 vassal PasswordBots?
PasswordBot and DefinitelyNotCollusionBot. They were submitted by Ruby and habryka, who responded to my request on the LW Tagger Slack.
Multicore gained some favor with me when he did an enormous amount of tagging during the tagging sprint. Figured I would use my entry for the good, even if I didn’t have time to write my own thing.
I see, they’re lumped with your bot in the red portion of the pie, and still running after 10 rounds.
Wow!
I had expected there’d be around 8 bots in the clique and around 50 bots in total (though not that many sillyBots). But I never imagined we’d rise from 15% to more than 50% of the pool as early as round 10!
The cloneBots are not even attacking the other bots yet. Until round 10, they often back down to 2 in case of 3-3, and they play tit-for-tat in case of 3-2. From round 10 to round 60, they’ll get progressively more greedy.
Would we fare better, worse, or the same if the rise in greediness was faster? I wanted to change it to 10->30, but ultimately didn’t.
I had thought there would be more attackers in the initial pool. I spent a lot of time fine tuning our behaviour against them (folding in the early rounds, then maintaining 3 more and more often later). Seems like it was mostly a waste of time.
On the other hand, the code to exploit 0-bots and the like was not wasted. Yum yum.
Now that the most easily exploitable sillyBots are out, it’s gonna be a race with Multicore’s bot. While we try to smother all the outsiders, Multicore will allow cooperators to survive while gaining score from them. If they survive long enough, we’ll be the ones smothered.
I think there’s a 70% chance we eliminate all non-clones/mimic by round 60. Even if we do, I expect Multicore to be bigger than the aggregate of the 2 next biggest at round 90 when the second phase begins (70%).
Cool competition! It makes me wish I had had more time to put into CooperateBot. At present I would say it instantiated a relatively naive view of cooperation, and could do much better if I invested more time considering the true nature of generosity. Looking at the obituary I suspect that CooperateBot may not last much longer.
How does your CooperateBot work (if you want to share?). Mine is OscillatingTwoThreeBot which IIRC cooperates in the dumbest possible way by outputting the fixed string “2323232323...”.
You will have to wait for next time’s obituary I’m afraid! I think Isusr should have a good grasp on the philosophical and ethical traditions I was attempting to channel with CooperateBot—while the insights are deep, I think the lengthy code is quite clear on the matter.
Can you tell us who is Insub and the story of your alliance with them?
I actually have no idea—I guess we are just two naturally very cooperative people!
Where did you get the name “Insub” from? Is there a more detailed report than in this post?
In the pie chart in the Teams section, you can see “CooperateBot [Larks]” and “CooperateBot [Insub]”
To clarify, the 8 all successfully recognize each other as clones, and the one who didn’t follow through submitted nothing? Relevant for scoring my predictions on the last comment thread.
8 players submitted legitimate CloneBots. 1 person submitted nothing. Multicore submitted EarlyBirdMimicBot.
Huh. I didn’t realize that was allowed.
The bots can access the number of the turn?? I thought that each pairing is an isolated iterated game that doesn’t know anything about the context.
Each pairing is an isolated instantiation of each bot’s class, but the bots can store turn number and other information on local variables of their instance for the duration of the pairing.
I thought that there are two cycles: an inner cycle which is an iterated game between two fixed opponents with over 100 rounds, and an outer cycle in which many such games are played between different pairs. The bots are aware of the history in the inner cycle but not in the outer cycle. So, I interpreted the “10 rounds” of the OP as 10 rounds of the outer cycle, in which many 100+ round games have already occured. But, then I dont understand how can the clone army coordinate on cooperating until outer round 90. Which leads me to suspect I’m misunderstanding something pretty basic?
The outer round number is what is passed to the init method of the bot class. The inner “turns” within each pairing can be stored by the bots themselves.