Multicore comments on The Darwin Game

Multicore 19 Oct 2020 18:13 UTC
43 points
tl;dr I betrayed the CloneBot clique and submitted MimicBot. My submission is in this Github repository.
My Process in chronological order:
There were three important things I noted in the original post
- Self cooperation is free, so an early lead is a big advantage.
- There are guaranteed to be “silly” bots in the early pool.
- You can vary your strategy depending on the round.
It seemed to me that the best strategy was to get as many points as possible early by exploiting silly bots, cooperating with anyone willing to cooperate, and folding somewhat to attackers. Then later on just play a non-exploitable cooperative strategy and hope for your early advantage to snowball.
Simulation
After seeing AbstractSpyTreeBot and some of the comments around it, it seemed to me that simulation was perhaps the best way to take advantage of simple bots. There are various approaches you could take, but mine was to simulate every possible sequence of moves I could make for the next N turns, and use the one with the best outcome. Since this has exponential time complexity, I only considered the moves 2 and 3 and kept N fairly small, with the option for the game runner to reduce it further to improve performance.
There are several possible pitfalls with simulators:
- The opponent might have randomized behavior, making your simulated predictions useless. In fact, not using randomization seems wrong to me, since you might get stuck in a rut against another deterministic bot, alternating 2 and 3 on the same schedule. Only bad bots would be deterministic and therefore simulatable.
- The opponent might have planted a landmine in their code to disqualify any potential simulator.
- The opponent might be slow, and make my dozens of simulations per turn take too long.
- The opponent might be a BullyBot, who attacks until the opponent retaliates. My short time horizon would make folding look correct, when actually retaliating would give more points in the long run. At the same time, I did want to fold against AttackBot.
My solution to the first three problems was to only run simulations against bots with two or fewer open parentheses “(” in their source. That’s how many you need to define __init__ and move, so it wouldn’t leave any for calls to exec or random or inspect. I suspect it might be possible to sneak a landmine past this requirement, but since I didn’t tell anyone what I was doing it seemed safe enough. Hopefully enough of the deterministic bots fulfill this requirement for simulation to be worthwhile.
For the BullyBot problem, I had the simulation also see what the best strategy was that didn’t let the opponent outscore by more than X but also got at least Y points. If this strategy didn’t exist, such as against AttackBot, I would fold, but otherwise I would play that strategy. I also made the simulation be more willing to be uncompromising against attackers as the rounds went on, in case one of the dumb bots was actually putting up a fight in the overall tournament.
For my backup strategy against non-simulatable bots I kept it simple and pulled from Zvi’s playbook. In the early rounds, I play an EquityBot which folds if the opponent plays 3 ten times in a row. In the later rounds, I play DefenseBot.
I considered a plan of obfuscating my code so that it would look safe to other simulators, and then disqualifying them if they tried to simulate me. I gave up on that because it seemed like other simulators would either be careful like I was or someone else would disqualify them for me. Plus, simulators are fun. I’d rather have a fair fight against other simulators. My code uses randomization and has lots of scary keywords in it, so other simulators aren’t going to be exploiting me.
To feed my simulation plan, I found two people who didn’t have time to make a proper submission, and got them to submit PasswordBot, which plays 3 on the first two turns, then if the opponent played 2 on the first two turns it plays 2 for the rest of the game, and otherwise plays 3 for the rest of the game. I figured that only a branching simulator could take advantage of this.
CloneBot
Throughout the week, a couple people publicly announced intent to join the CloneBot clique. I decided that the CloneBots would be a huge threat that I might not be able to compete with if I didn’t do something to address them. Originally I was planning to not contact the clique and use the public source, but when Vanilla_cabs announced the source would not be public, I messaged them indicating willingness to join the clique. I’m not sure how suspicious the timing seemed to them, but I was included anyway. I gave Vanilla_cabs some notes about other parts of the code, but intentionally did not comment on anything related to security.
My plan was to bypass whatever they were using to check that the opponent is a clone, so that I could get perfect cooperation with clones, but also use my simulation strategy to exploit dumb bots and continue to cooperate with outsiders as the clones got more aggressive.
On Friday, the CloneBot security update was released, with the final version of the clone check.
(I can’t figure out how to paste code into here), the source is here.
There are two constraints:
- After the string sanitization, my code before the payload must compare identical with the sanitized version of my opponent’s pre-payload code.
- After the string sanitization, my code after the payload must have at least eight spaces at the beginning of every line.
I got cracking.
I noticed that the checker was splitting the code by “\n”, so I wondered if there was any other character that python would treat as semantically equivalent to a newline. There is—the ASCII Form Feed character \f (0x0C). However, this didn’t work. I hadn’t noticed the earlier sanitization step of using splitlines() and then joining by \n. All my \f’s were just getting turned into \n’s and getting busted.
I then wondered if there was any other way of convincing python that there was a newline, or convincing it to dedent to a lower indentation level, without using a character that would be caught by splitlines or using less than eight spaces of indentation. A string of google searches led me to reading through the C code for the python lexer, which was very educational but convinced me that this route was a dead end.
I noticed in reading about the lexer that it has a somewhat complicated system for handling tabs and spaces in indents. I noticed that the code sanitization was automatically converting tabs to four spaces and wondered if I could use a mixture of tabs and spaces in indentation to manipulate the if / else statements in the clonebot move function to make it run the payload early. Unfortunately python 3 does not allow any mixture of tabs and spaces which makes the meaning of the code depend on how many spaces a tab is worth. I think this attack vector might be valid if we were using python 2.
At this point it was 3 AM and I slept on it. On Saturday morning I thought back to the very thing that had foiled my first attempt, the splitlines() sanitization. Instead of finding a character that would be treated as a newline but not caught by splitlines, what about a character that would be caught by splitlines but not treated as a newline? I found a thread online where people were complaining about splitlines splitting on certain characters, and a dev said the behavior was intentional. One of those characters was 0x1D, the ASCII Group Separator character. I tried it in a test program, and it worked. I tried it in CloneBot, and… it worked!
That’s the technique my final program uses. Starting with the first comment in the move function, I replaced every newline with 0x1D, which is a valid character in a comment. The bottom half of the CloneBot source is a comment and gets skipped over, so the move function actually runs my custom code, which is all indented at least eight spaces and fulfills the second constraint of the clone check. I use the same cooperateWithClone function as the clones to cooperate with them until the showdown round, at which point I just treat them like any other bot.
I’m not too surprised that other people tried and failed to cheat the restrictions, since it took me 8+ hours and some real thinking outside the box. It does make me feel better that I’m not the only would-be betrayer.
Predictions
How do I think I’m going to do? My bot passes the clone check both in Vanilla_cabs’s official checking tool and in my test harness, but I can see some possibility of something introducing enough differences to screw up the check. Vanilla_cabs encouraged people to submit through the form, while I submitted a git repo. Lsusr promised to preserve my special characters, but the real game environment might be different enough to make the check go wrong, though if that was the case maybe all clones would fail the check.
Separately, my bot is pretty complicated. I wouldn’t be surprised if it was the longest submitted, at over 250 lines of code. I’ve done a fair amount of testing, but I am not confident in my ability to make my software robust to anything that might happen. There’s a decent chance something unexpected will happen and I’ll crash and get disqualified.
However, if my bot doesn’t have a crippling bug and no one else figured out how to submit a MimicBot, I don’t really see how I lose. I get lots of points from clones, dumb bots, and cooperators, while everyone else is losing out against at least one of those.
I’ll say 30% my bot crashes or fails the clone test, 10% someone else submitted MimicBot and theirs is better, 10% those things don’t happen but I still lose, 50% I win.
What links here?
- The Darwin Game—Rounds 0 to 10 by lsusr (24 Oct 2020 2:17 UTC; 107 points)
- Zack_M_Davis's comment on The Darwin Game—Rounds 21-500 by lsusr (21 Nov 2020 23:35 UTC; 5 points)
- Lukas Finnveden 19 Oct 2020 22:17 UTC
  3 points
  Parent
  Damn, good job. We should’ve gone with my suggestion that the whole payload needed to fit on one line, separated by ; (though maybe this would’ve caused us to lose so many clique-members out of annoyance that it wouldn’t have been worth it).
  - Vanilla_cabs 20 Oct 2020 17:32 UTC
    1 point
    Parent
    That’s an idea worthy of consideration, but in addition to the risk you raised, I also feared some members would have submitted invalid bots.
    - Lukas Finnveden 20 Oct 2020 18:21 UTC
      2 points
      Parent
      Yeah, if we’d seen the issue, I think we could’ve gotten around it just by not using splitlines, which would’ve been smoother.
      
      Though of course, this exploit updates me towards thinking that there are other vulnerabilities as well.
      - philh 20 Oct 2020 21:02 UTC
        2 points
        Parent
        If we don’t use splitlines we instead need to use something similar, right? Like, even if we don’t need to worry about LF versus CRLF (which was a genuine suggestion I made), we still need to figure out if someone’s got any de-indents after the start of the payload. And I don’t expect us to do better without splitlines than with it.
        Lukas Finnveden 20 Oct 2020 23:03 UTC
        1 point
        Parent
        Using newlines to figure out what happens after “payload” is fine, as far as I can tell. Multicore’s exploit relies on newlines being used when comparing stuff before the payload.
        
        Stuff like CRLF vs LF is a bit awkward, but can maybe be handled explicitly?
        philh 20 Oct 2020 23:44 UTC
        2 points
        Parent
        Oh yeah, that’s true as far as I know. I guess it depends how much we trust ourselves to find all instances of this hole. A priori I would have thought “python sees a newline where splitlines doesn’t” was just as likely as the reverse. (I’m actually not sure why we don’t see it, I looked up what I thought was the source code for the function and it looked like it should only split on \n, \r and \r\n. But that’s not what it does. Maybe there’s a C implementation of it and a python implementation?)
- philh 19 Oct 2020 21:27 UTC
  3 points
  Parent
  Clever! I looked for holes in mostly the same directions as you and didn’t find anything. I think I either didn’t think of “things splitlines will split on but python won’t”, or if I did I dismissed it as being not useful because I didn’t consider comments.
- Taleuntum 19 Oct 2020 21:23 UTC
  3 points
  Parent
  Your betrayal of the clique is very nice, hats off to you. I also liked your idea of getting others not that interested in the game to submit bots helping you, It’s a pity it did not occur to me.
  
  However, I think you are, too, overconfident in you winning. I’ve run a simulation of the whole tournament till the 160th round with 8 bots (MatrixCrashingBot, TitforTatBot, PasswordBot1, PasswordBot2, EmptyCloneBot, earlybird, incomprehensiblebot, CliqueZviBot) and in the final equilibrium state there are three bots: earlybird, incomprehensiblebot and CliqueZviBot with roughly equal populations. While PasswordBots do help you at the start your lead seems to disappear later when the dumb bots and non-clique members die (which is nice, because your bot’s output when simulating is pretty annoying). Sure, it’s just one run of the tournament with a low number of participants (it’s slow to do the tournament on my laptop), but it’s something.
  What links here?
  - The Darwin Game—Rounds 0 to 10 by lsusr (24 Oct 2020 2:17 UTC; 107 points)
  - Multicore 19 Oct 2020 21:40 UTC
    1 point
    Parent
    That does revise down my expectations of winning, but my bot having run thousands of times on someone else’s computer and not crashing (or failing the clone check?) is good to hear.
    Maybe I’m overestimating the snowball effects of an early pool. If the late game has everyone cooperating with everyone else, your matches with others are only giving a tiny bit fewer points than matches against your copies.
- Vanilla_cabs 20 Oct 2020 17:44 UTC
  2 points
  Parent
  Originally I was planning to not contact the clique and use the public source, but when Vanilla_cabs announced the source would not be public, I messaged them indicating willingness to join the clique. I’m not sure how suspicious the timing seemed to them, but I was included anyway.
  I gave an equal chance to an early newcomer and a late newcomer of trying to betray. Maybe I was wrong, and I should be mindful of that in the future.
  Also, I felt like our numbers were dangerously close to the minimum (6), and I expected a couple members to retract before I revealed the names (which di not happen). So in my mind I had no choice but to accept you.
  I tried it in CloneBot, and… it worked!
  Good job! My plan for security was to leave an obvious vulnerability, and entrust members who would report with the task to look for more subtle ones. Only Lanrian reported, late in the week, and I didn’t trust them enough because I was suspicious of their motive when they’d asked me for making our code easier to simulate (which turns out they were honest about).

Multicore comments on The Darwin Game

My Process in chronological order:

Simulation

CloneBot

Predictions