Mass_Driver comments on [SEQ RERUN] The Truly Iterated Prisoner’s Dilemma

Mass_Driver 23 Aug 2012 19:19 UTC
0 points
I mean, “terribly horrible” on what scale? If the criterion is “can it be strictly dominated by another strategy in terms of results if we ignore the cost of making the strategy more complicated,” then, sure, a strategy that reliably allows opponents to costlessly defect on the first of 100 rounds fails that criterion. I’d argue that a more interesting set of criteria are “is the expected utility close to the maximum expected utility generated by any strategy,” “is the standard deviation in expected utility acceptably low,” and “is the strategy simple enough that it can be taught, shared, and implemented with little or no error?” Don’t let the perfect become the enemy of the good.
- Andreas_Giger 23 Aug 2012 19:51 UTC
  0 points
  Parent
  
  I mean, “terribly horrible” on what scale?
  
  On a scale from 0 to “a million people died because someone was being irrational”, it would be around “two million people died because someone was being irrational.”
  
  On an unrelated note, the idea of precommitting in non-repeated IPD is silly; because if both players are precommitting simultaneously (before learning of their opponent’s precommittment) it’s the same as no-one precommitting, since they can’t update their strategy with that knowledge, and otherwise it’s an asymmetrical problem.
  
  The solution to that asymmetrical problem, if you’re the one who has to make the precommittment, is to precommit to simple TFT, I think. (~100% confidence)
  - Mass_Driver 23 Aug 2012 22:54 UTC
    1 point
    Parent
    I’m not sure what’s silly about it. Just because there’s only one game of IPD doesn’t mean there can’t be multiple rounds of communication before, during, and after each iteration.
    
    As for the asymmetrical problem, if you’re really close to 100% confident, would you like to bet $500 against my $20 that I can’t find hard experimental evidence that there’s a better solution than simple TFT, where “better” means that the alternative solution gets a higher score in an arena with a wide variety of strategies? If I do find an arena like that, and you later claim that my strategy only outperformed simple TFT because of something funky about the distribution of strategies, I’ll let you bid double or norhing to see if changing the distribution in any plausible way you care to suggest changes the result.
    - Andreas_Giger 24 Aug 2012 0:40 UTC
      0 points
      Parent
      
      I’m not sure what’s silly about it. Just because there’s only one game of IPD doesn’t mean there can’t be multiple rounds of communication before, during, and after each iteration.
      
      I’m not talking about communication in general; only about one-time precommitting. If it is asymmetrical I don’t consider that real IPD anymore, and if it isn’t, I fail to see how it would change anything, unless you would find some way for both players to precommit and take their opponent’s precommitment into consideration.
      
      I didn’t mean to offend you by saying that it was stupid; I just don’t find it an interesting problem from a game theory perspective.
      
      As for the bet, it is easy to set up an arena of strategies where any particular strategy would fare worse than a certain other strategy—in this case, a pool of RandomBot derivatives would be a fine arena for DefectBot to excel in. Also, “a wide variety” is far too arbitrary for any bets.
      
      What I had in mind as the “best” strategy is the strategy that scores highest against the strategy that is designed to score highest against it. TFT scores 198 points (paperclips, million lives saved, whatever) against 99C1D, which is the best way to deal with TFT. However, I should probably just give you the $500, since it turns out 198 is way too low even for a purely deterministic strategy. I’m now ~100% certain (haha!) that the highest possible score is 248; and if you consider non-deterministic strategies, the score increases to 248.5-e. And sadly, there’s no way for you to score higher than 101 respectively 100.5+e against this strategy.
      - The_Duck 26 Aug 2012 4:16 UTC
        1 point
        Parent
        
        What I had in mind as the “best” strategy is the strategy that scores highest against the strategy that is designed to score highest against it. TFT scores 198 points (paperclips, million lives saved, whatever) against 99C1D, which is the best way to deal with TFT. However, I should probably just give you the $500, since it turns out 198 is way too low even for a purely deterministic strategy. I’m now ~100% certain (haha!) that the highest possible score is 248; and if you consider non-deterministic strategies, the score increases to 248.5-e. And sadly, there’s no way for you to score higher than 101 respectively 100.5+e against this strategy.
        
        You seem to have left this as an exercise to the reader, so I’ve worked it out with proof :) If we can publicly precommit and the opponent can’t, we should commit to the strategy: “Defect for the first fifty rounds. If the opponent fails to cooperate even once during that time, defect for the last fifty as well. Otherwise play TFT for the last fifty rounds.”
        
        Taking the other guy’s side for a moment, if we know the opponent is using this strategy, how do we get the most points? If we bow to the implied threat and cooperate for the first fifty rounds, then cooperate for the last fifty as well, we get 100 points, all during the last fifty rounds. We can bump this up to 101 by defecting on the last round. Meanwhile the opponent wins 150+98 = 248 points. If we defect at least once during the first fifty we might as well defect for all 100 rounds, since that is what the opponent will do, and this gets only 100 points, so this strategy is not as good. So as you claimed this blackmail strategy wins 248 points against the strategy the wins the most points against it.
        
        I think the following is a proof that 248 is optimal:
        
        Suppose there are A cases of we-cooperate-opponent-defects, B cases of we-cooperate-opponent-cooperates, and C cases of we-defect-opponent-defects, and 100-A-B-C cases of we-defect-opponent-cooperates. Then the opponent wins 3A+2B+C points. This must be >= 100 because otherwise the opponent can win more by defecting all the time instead of doing what we want him to do. Meanwhile we win 2B+C+3*(100-A-B-C) = 300-3A-B-2C points, and we want to maximize this number subject to the aforementioned constraint. Additional constraints are that A+B+C ⇐ 100 and that A, B, C are all > 0. This is a linear programming problem, for which the optimal solution always lies at one of the corners of the allowed domain. The corners of the allowed domain are at
        
        A = 0; B = 50; C = 0 [50 mutual cooperations, 50 we-defect-opponent-cooperates] - this wins 250 points for us, and the opponent wins 100 points.
        A = 33.3; B = 0; C = 0 [We cooperate while allowing the opponent to defect 33.3 times, then defect while the opponent cooperates for 66.6 rounds] - we only win 200 points
        A = 100; B = 0; C = 0 [We always cooperate; opponent always defects] - we win nothing.
        A = 0; B = 100; C = 0 [Everyone always cooperates] - we only win 200 points.
        A = 0; B = 0; C = 100 [everyone always defects] - we only win 100 points
        
        So the best we can hope for is 250 points. However we can’t really guarantee getting 250 because in the case where we get 250 points the opponent gets only 100, so he is indifferent between allowing us the 250, or defecting all the time and only allowing us 100. To ensure that the opponent follows our plans, we actual need to give him at least 101 points.
        
        Suppose we defect 49 times while the opponent cooperates, followed by 51 mutual cooperations. This would give us 493 + 51\2 = 249 points, and the opponent 51*2 = 102 points. However we can never force the opponent to cooperate on the last move, since we have no ability to punish him for defecting, so we can’t actually make this happen. The opponent is going to defect on the last move no matter what our strategy is. So 249 points is out of reach as well.
        
        We can win 248 points by force against a rational self-interested opponent using the strategy in the first paragraph, so that is indeed the maximum, as you claimed.
        
        I haven’t thought about nondeterministic strategies though.