orthonormal comments on How do we really escape Prisoners’ Dilemmas?

orthonormal 2 Sep 2012 5:02 UTC
1 point
Firstly, “always defect” loses hard in evolutionary tournaments with fixed but large iteration numbers. See here for a great example; in general (and if populations could grow back from tiny proportions), what you see is a Paper-Rock-Scissors cycle between TFT, “TFT until the last turn, then defect”, “TFT until the second-to-last turn, then defect”, and so on (but not all the way back to the beginning; TFT trumps one of the early-defectors after only a few steps, where the number depends on the payoff matrix).

Secondly, as an intuition pump, an iterated tournament in which every round has a 1% chance of being the last is one in which it’s hard to beat TFT. And capping it at 1000 rounds only changes the outcome in a minuscule number of cases, so there’s no practical difference between TFT, “TFT until turn 1000, then defect”, “TFT until round 999, then defect”, and so on until we reach strategies that are dominated by TFT again.

Does that make sense?
- drnickbone 2 Sep 2012 10:25 UTC
  0 points
  Parent
  Thanks for the link. One issue with these tournaments is that the strategies are submitted in advance, and then one “wins”. Whereas, in evolutionary terms, you’d have new submissions after the “winner” dominates the population. There is nothing obvious to stop a population of TFT being invaded by TFT-1, which in turn is invaded by TFT-2, which is in turn invaded by TFT-3 and so on.
  
  You argue that TFT-n, for some n is then invaded by TFT, so there is a “rock, scissors, paper” cycle, but how does that work? A solitary TFT will co-operate one more time than the surrounding population of TFT-n, and will meet defection in that final co-operation so it has a strictly lower fitness than TFT-n. So it can’t invade.
  
  Possible solutions to this are if a group of TFTs show up in a TFT-n population, and have most of their interactions with each other, or at least enough interactions with each other to outweigh the lower utilty against TFT-n. That is in effect a Group Selection argument (which I discussed in my original article, part 4), and I agree it could work, but I’m a bit concerned about relying on such arguments. The standard treatment of an ESS assumes that the “invader” has almost all of its interactions against the existing strategy.
  
  On the “intuition pump”, I noticed this case in my reply to Unnamed and Randaly earlier, and in my original article, part 1:
  
  This is most plausibly the case where there is a very large upper bound on iterations (such as 100 years), but the upper bound is so rarely (if ever) reached in practice, that strategies which do something different in the final phase just don’t have a selective advantage compared to the cost of the additional complexity. So the replacement of TFT by TFT-1 never happens.
  
  My concern was that we do, in fact, find cases where we know we are in the final round (or the only round) and our behaviour is, in fact, a bit different in such cases (we co-operate less, or have something like a 50% chance of co-operating vs 50% of defecting). But we don’t always defect in that final round. This is an interesting fact that needs explanation.
  
  By the way, other commenters have argued these “known last round” or “known single round” cases are an artefact of current conditions, and wouldn’t have occurred in ancestral conditions, which strikes me as an ad hoc response. It’s not hard to see such interactions happening in an ancestral context too, such as one-off trades with a nomadic clan, passing through. We probably would trade, rather than just steal from the nomads (and risk them staying and fighting, which is strictly irrational from their point of view, but rather likely to happen). Or consider finding a tribal colleague alone in a desert with a very-lethal-looking wound (missing legs, blood everywhere) crying in pain and asking for some water. Very safe to walk away in that case, since very high chance that no-one would ever know. But we wouldn’t do it.
  - orthonormal 6 Sep 2012 2:38 UTC
    1 point
    Parent
    
    You argue that TFT-n, for some n is then invaded by TFT, so there is a “rock, scissors, paper” cycle, but how does that work? A solitary TFT will co-operate one more time than the surrounding population of TFT-n, and will meet defection in that final co-operation so it has a strictly lower fitness than TFT-n. So it can’t invade.
    
    Oops, you’re right- there’s a minimum “foothold” proportion (depending on the payoffs and on n) that’s required. But if foothold-sized cliques of various TFT-n agents are periodically added (i.e. random mutations), then you get that cycle again—and in the right part of the cycle, it is individually beneficial to be TFT rather than TFT-n, since TFT-n never gets cooperation on any of the last (n-1) turns.
    
    On your other point, it’s worth noting that organisms are adaptation-executors, not fitness-maximizers; it seems easier to evolve generally altruistic values (combined with a memory of past defectors to avoid getting exploited by them or by similar agents again) than to evolve a full calculator for when defection would be truly without cost.
    - drnickbone 6 Sep 2012 9:01 UTC
      0 points
      Parent
      This “clique” solution has some problems. First, a single mutant can’t form a clique. OK, but maybe the mutant is interacting with nearby individuals, some of whom also share the mutation? That works if the nearby partners are relatives, but the difficulty there is that kin selection would already be favouring co-operation with neighbours, so how does TFT get an advantage? You can juggle with the pay-offs and the “shadow of the future” probability to try and get this to work (i.e. find a set of parameters where co-operation with neighbours via kin selection is not favoured, whereas TFT is), but it all looks a bit shaky.
      
      Andreas Griger below suggests that the TFT mutants preferentially interact with each other rather than the TFT-n (or DefectBots) around them. This is another solution, though it adds to the overhead/complexity of a successful invader. However, it does lead to a nice testable prediction: species which practice reciprocation with non-relatives will also practice partner selection.
      
      The point about adaptation executors not being fitness maximizers was also brought up by Unnamed below though see my response. The general issue is that citing the link is not an all-purpose excuse for maladaptation (what Richard Dawkins once referred to as the “evolution has bungled again” explanation). In particular you might want to see a paper by Fehr and Henrich Is Strong Reciprocity a maladaptation? which looks at the maladaptation hypothesis in detail and shows that it just doesn’t fit the evidence. Definitely worth a read if you have time.
  - Unnamed 2 Sep 2012 23:02 UTC
    0 points
    Parent
    
    By the way, other commenters have argued these “known last round” or “known single round” cases are an artefact of current conditions, and wouldn’t have occurred in ancestral conditions, which strikes me as an ad hoc response. It’s not hard to see such interactions happening in an ancestral context too, such as one-off trades with a nomadic clan, passing through. We probably would trade, rather than just steal from the nomads (and risk them staying and fighting, which is strictly irrational from their point of view, but rather likely to happen). Or consider finding a tribal colleague alone in a desert with a very-lethal-looking wound (missing legs, blood everywhere) crying in pain and asking for some water. Very safe to walk away in that case, since very high chance that no-one would ever know. But we wouldn’t do it.
    
    The point isn’t that one-shot PDs never arose, or that it was always rational to cooperate. The point is that one-shot PDs were rare, compared to other interactions, and it was often rational to cooperate (because our ancestors evolved adaptations and developed a social structure which turned potential one-shot PDs into something else). And since the mechanisms that influence human behavior (natural selection, emotions, heuristics, reinforcement learning, etc.) don’t have perfect fine-grained control which allows them to optimize every single individual decision, we often wind up with cooperation even in the cases that are true one-shot PDs.