Scott Garrabrant comments on Prisoner’s Dilemma vs the Afterlife

Scott Garrabrant 24 Sep 2013 18:32 UTC
10 points
The million-turn game encourages taking as long as you need to figure out what code the opponent is likely running, then figuring out how to exploit it, thus gaining the maximum benefit for the super majority of the rounds. There will not be a “typical” per round value.

I think walking you through an example would be easier for me than finding a source. Imagine the matrix is 2,2 for both cooperate, 1,1 for both defect, and 3,0 for cooperate defect.

Let’s say you repeat the game with probability p each round. We want to determine if both players playing tit for tat is an equilibrium strategy. So, we will assume that both players play tit for tat, and see if they have any incentive for changing.

If we both play tit for tat, our expected outcome is 2+2p+2p^2...=2/(1-p). If we were to want to change our strategy, we would have to do so by defecting at some round. Without loss of generality, assume we do so on the first round. If we defect on only the first round, our output would be 3+0p+2p^2+2p^3… which loses us 2p-1 points. As long as p>1/2, this is a bad idea. The same is true for every round. Every time you add a defect, you have a p percent chance that in the next round the opponent punishes you for twice as much as you gained. so if p>1/2, there is no incentive for defecting.

If p<1/2, both players playing tit for tat is not an equilibrium. However, both players playing grim trigger still might be (cooperate until your opponent defects once, then always defect).
- satt 25 Sep 2013 1:54 UTC
  3 points
  Parent
  
  both players playing grim trigger
  
  Reminds me of the folk theorem in game theory, which looks like it may apply to games that repeat with probability p if p’s high enough. (I’m no game theorist; it’s just a hunch. My thinking here is that p is like a discount factor, and the theorem still works even with a discount factor, if the discount factor’s high enough.) If so, a strong & pervasive enough belief in an afterlife might enable all sorts of equilibria in more complicated games.
  - Scott Garrabrant 25 Sep 2013 2:12 UTC
    2 points
    Parent
    p is exactly a like a discount factor.
    
    Yes, if everyone believes that they get huge payoff in the afterlife for using strategy X, then everyone using strategy X is an equilibrium. This is exactly how many religions work.
- DataPacRat 24 Sep 2013 21:05 UTC
  0 points
  Parent
  To be sure that I understand it—by having p set to ¹⁄₂, you’re referring to there being at least a 50% chance that there’ll be at least one more round of the game?
  
  If so, I’m somewhat surprised that the odds which make tit-for-tat a winning strategy are such a simple number, which implies that I didn’t understand the underlying aspects of PD strategy as well as I should. I’m going to have to think a bit more about them.
  - Scott Garrabrant 24 Sep 2013 21:19 UTC
    3 points
    Parent
    Yes. Every round you stop playing with probability ¹⁄₂ and continue playing with probability ¹⁄₂.
    
    The answer ¹⁄₂ is a function of the PD matrix I chose. If you choose a different matrix, you will get a different number.
    - DataPacRat 25 Sep 2013 21:58 UTC
      0 points
      Parent
      After a night of thought; if I’m reading this right, then your described method of discounting only considers a single future round’s 50% probability. But, if we extend it, to a 25% chance of two future rounds, and 12.5% for three rounds, and so forth, doesn’t that converge on a total of 100% for all future rounds summed up?
      - Scott Garrabrant 25 Sep 2013 22:38 UTC
        0 points
        Parent
        I think you are confused.
        
        All you are saying is that if each round you have a ¹⁄₂ chance of playing the next round, then the game will last exactly 1 round with probability ¹⁄₂, exactly 2 rounds with probability ¹⁄₄, exactly 3 rounds with probability ¹⁄₈ and so on. Of course this adds up to one, since it is a probability distribution on all possible lengths of the game, and probability distributions always sum to one. The fact that it sums to 1 has nothing to do with game theory.
        DataPacRat 25 Sep 2013 23:14 UTC
        0 points
        Parent
        It’s the more-than-one-round calculation that I’m currently trying to wrap my brain around, rather than the sum of a series of halves adding to one. If there’s a ¹⁄₃ chance of each round continuing, then that also adds up, with ¹⁄₉ of the second round’s value, and ¹⁄₂₇ of the third’s, and so on—it doesn’t add up to one, but it does add up to more than ¹⁄₃. Ditto if there’s a ³⁄₄ chance of a next round, or a 99% chance.
        Scott Garrabrant 25 Sep 2013 23:26 UTC
        0 points
        Parent
        In the p=1/3 case, there is a ²⁄₃ chance of lasting exactly 1 round, ²⁄₉ of lasting exactly 2 rounds, ²⁄₂₇ three rounds. This does add up to 1. It will always add up to 1.
        DataPacRat 25 Sep 2013 23:38 UTC
        0 points
        Parent
        We seem to be talking past each other. Yes, the total odds add up to 100%; but the sum of how important each individual round is, differs.
        
        Let’s say that the factor is ²⁄₃. Then the first round contributes a total of ²⁄₃ its nominal score to the expected value; the second round contributes (2/3)^2=4/9 of its score; and already that adds up to more than 1 - meaning that the effects of future rounds are more likely to outweigh the benefits of a defection-based strategy.
        Scott Garrabrant 25 Sep 2013 23:51 UTC
        2 points
        Parent
        Okay, I understand the issue now, I think. So, summing up the effect of all the future rounds in exactly the way you are describing is something you would do to determine if grim trigger is an equilibrium strategy. (If you defect now, you get punished in ALL future rounds) However, in tit for tat, your punishment for defecting only lasts for 1 round, so you don’t have to add all that up.