lackofcheese comments on 2014 iterated prisoner’s dilemma tournament results

lackofcheese 24 Sep 2014 17:29 UTC
5 points
Actually, the tournament format allowing a 1v1 equilibrium strategy to lose is rather a trivial fact—DefectBot is a 1v1 equilibrium strategy and it loses pretty badly indeed. Also, what do you mean by 1v1 equilibrium, specifically? Are you talking about 1v1 where the goal is to maximise your expected score, or 1v1 where the goal is to have a higher score than your opponent?

In the tournament context, a tournament where everyone submits DefectBot is a Nash equilibrium for the entire tournament. More generally, you’re wrong about the existence of Nash equilibria not being guaranteed; as long as there are finitely many players and finitely many strategies, and mixed strategies are allowed, there has to be at least one Nash equilibrium.

That being said, in this game a strategy is a program, so a mixed strategy would be a probability distribution over programs, not a program with random elements; you would have to do something like send an email to tetronian2 saying “I submit CheeseBot1 with 50% probability and CheeseBot2 with 50% probability”.

It’s also very important to remember that Nash equilibria are made up of strategy profiles, not strategies for individual players. In this situation the game is completely symmetric so each player has the same strategies available, but there can still be equilibria where different players play different strategies, and in general there may not be a pure-strategy equilibrium where everyone uses the same strategy.

As such, if everyone submits an equilibrium strategy it is not necessarily true that the strategies together form an equilibrium, because they might be equilibrium strategies that are from different Nash equilibria. See, for a simple example, the stag hunt.

In two-player zero-sum games Nash equilibria are, in fact, interchangeable; if (A1, B1) and (A2, B2) are equilibria , then so are (A1, B2) and (A2, B1). However, for more than two players the interchangeability property no longer holds, and so overall you’re correct in saying that “meta-game” necessarily comes up. Every player can pick an equilibrium strategy and yet the collective result may fail to be an equilibrium.
- V_V 24 Sep 2014 19:52 UTC
  2 points
  Parent
  Thanks very much for the info.
  
  Actually, the tournament format allowing a 1v1 equilibrium strategy to lose is rather a trivial fact—DefectBot is a 1v1 equilibrium strategy and it loses pretty badly indeed. Also, what do you mean by 1v1 equilibrium, specifically? Are you talking about 1v1 where the goal is to maximise your expected score, or 1v1 where the goal is to have a higher score than your opponent?
  
  I should have mentioned that I was talking about a 1v1 equilibrium of strategies that maximizes your expected score.
  DefectBot is such an equilibrium strategy for the vanilla iterated-PD with fixed time horizon, but not if you allow simulation.
  
  In the tournament context, a tournament where everyone submits DefectBot is a Nash equilibrium for the entire tournament.
  
  Yes, in fact this was one of the reasons why I was skeptical of the tournament format (and James Miller was as well, IIRC). In the end it turned out Miller was the only one to submit a DefectBot.
  Eventually, I predicted that there were going to be lots of Tit-for-tat-like bots to cooperate with and exploit in the last rounds, and that some bots that used a strategy similar to mine (play a 1v1 max-payoff equilibrium strategy while attempting to exploit “dumb” bots), hence I went for it hoping for a tie at the top position. It almost worked.
  
  More generally, you’re wrong about the existence of Nash equilibria not being guaranteed; as long as there are finitely many players and finitely many strategies, and mixed strategies are allowed, there has to be at least one Nash equilibrium.
  
  Ok, I recalled incorrectly. :)
  - lackofcheese 24 Sep 2014 20:09 UTC
    3 points
    Parent
    
    I should have mentioned that I was talking about a 1v1 equilibrium of strategies that maximizes your expected score. DefectBot is such an equilibrium strategy for the vanilla iterated-PD with fixed time horizon, but not if you allow simulation.
    
    DefectBot maximises expected score against DefectBot, because you can’t do better vs DefectBot than defect every round and get 100 points. As such, DefectBot vs DefectBot is still a Nash equilibrium.
    
    You need to be more specific as to what “maximizes your expected score” means, because depending on your definition I could come up with some very surprising strategies you might not be expecting.
    - V_V 24 Sep 2014 21:11 UTC
      1 point
      Parent
      By “score” I mean the sum of the payoffs that your bot receives during each round in a single 1v1 match.
      By “expected” I mean w.r.t. both the internal random source of each both and the random source you use to choose your bot if you are using a mixed strategy.
      - lackofcheese 24 Sep 2014 21:25 UTC
        1 point
        Parent
        DefectBot gets the maximum possible expected score of 100pts vs DefectBot—it’s not possible to do better vs DefectBot.
        V_V 24 Sep 2014 21:29 UTC
        1 point
        Parent
        Yes, but there are other equilibria were both players get an higher score.
        lackofcheese 24 Sep 2014 21:30 UTC
        3 points
        Parent
        Yes, and which one of those equilibria do you pick?
        V_V 24 Sep 2014 21:44 UTC
        1 point
        Parent
        That’s a coordination problem.
        
        Unlike program-swap (I)PD, where these max-payoff equilibria are the CliqueBots equilibria, and there doesn’t seem to be any “natural” clique to pick as a Schelling point, in program-simulation IPD it seems that there is a Schelling point.
        lackofcheese 24 Sep 2014 22:01 UTC
        3 points
        Parent
        The term “max-payoff equilibrium” is ill-defined.
        
        Consider this pair of bots:
        1) ExtortionBot, who defects for 80 rounds, and then cooperates with you for 20 rounds if and only if you cooperated for all of the first 80 (otherwise it defects for those rounds as well).
        2) WeakBot, who always defects for the last 20 rounds, and cooperates with you for the first 80 if and only if it simulates that you will cooperate for the last 20 rounds iff WeakBot cooperates with you for the first 80 (otherwise it defects for the first 80).
        
        The maximum score you can get vs ExtortionBot is 100 points, which is how many points WeakBot gets.
        The maximum score you can get vs WeakBot is 400 points, which is how many points ExtortionBot gets.
        
        Ergo ExtortionBot/WeakBot forms a Nash Equilibrium. Is that a max-payoff equilibrium, or is it not?
        V_V 28 Sep 2014 12:54 UTC
        2 points
        Parent
        That means that this game is a symmetric bargaining problem.
        According to Wikipedia, proposed solutions are symmetric, Pareto-optimal (i.e. “max-payoff”) equilibria.
        
        It seems to me that VOFB or something similar to it is a strategy leading to one of these equilibria (do other symmetric Pareto-optimal equilibria exist?)
        lackofcheese 28 Sep 2014 17:48 UTC
        0 points
        Parent
        I think there are many such equilibria, but they all rely on the same basic principle. I’ve clarified this in a top-level comment, along with a simple example of a symmetric Pareto-optimal equilibrium strategy.