Decius comments on Other prespective on resolving the Prisoner’s dilemma

Decius 5 Jun 2013 21:51 UTC
0 points
… My line of thought is unchanged if Omega simply learns your decision after you decide but before Omega decides. The game is now not symmetrical.

Currently, I have concluded that if it is best to cooperate if the other player has cooperated, it is best for the first player to cooperate against a rational opponent. (3 instead of 1). However, it is better to cooperate with >1/3 chance, and that still provides a higher expected result to the first player.

If, given the choice between 3 points and 5 points, 5 points is better, then it is best for the first player to defect (1 instead of 0).

In the end, the first player has 2 possible strategies, and the second player has 4 possible strategies, for a total of 8 possibilities:
```
         Player 1\Player 2: {c:C;d:C} {c:C;d:D} {c:D;d:C} {c:D;d:D}
         c....................c:C 3/3...c:C 3/3...c:D 0/5...c:D 0/5
         d....................d:C 5/0...d:D 1/1...d:C 5/0...d:D 1/1
```
My problem is that if quid pro quo {c:C;d:D} is the optimum strategy, two optimal players end up cooperating. But quid pro quo is a strictly worse strategy than defectbot {c:D;d:D}. However, if defectbot is the best strategy for player 2, then the best strategy for player 1 is to defect; if quid pro quo is the best strategy for player 2, then the best strategy for player 1 is to cooperate.

I have trouble understanding how the optimal strategy can be strictly worse than a competing strategy.

IFF quid pro quo is optimal, then optimal players score 3 points each.
However, iff quid pro quo is the optimal strategy, then defectbot scores more against an optimal player 1; the optimal player 1 strategy is to defect, and optimal players score 1 point each.
- hairyfigment 6 Jun 2013 0:42 UTC
  0 points
  Parent
  Please stop using the words “rational” and “optimal”, and give me some sign that you’ve read the linked post on counterfactuals rather than asking counterfactual questions whose assumptions you refuse to spell out.
  
  The only difficult question here concerns the imbalance in knowledge between Omega and a human, per comment by shminux. Because of this, I don’t actually know what TDT does here (much less ‘rationality’).
  - Decius 6 Jun 2013 3:20 UTC
    0 points
    Parent
    Assumptions: The game uses the payout matrix described OP, and the second player learns of the first player’s move before making his move. Both players know that both players are trying to win and will not use a strategy which does not result in them winning.
    
    My conclusion is that both players defect. My problem is that it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated.
    
    I’ve thrown out cooperatebot and reverse quid pro quo as candidates for best strategy.
    
    FYI: I’m using this as my reference, and this hinges on reflexive inconsistency. I can’t find a reflexively consistent strategy even with only two options available. (Note that defectbot consistently equals or outperforms quid pro quo in all cases)
    - hairyfigment 6 Jun 2013 5:30 UTC
      0 points
      Parent
      Again, you don’t sound like you’ve read this post here. Let’s say that, in fact, “it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated”—though I’m not at all sure of that, when player 2 is Omega—and let’s say Omega uses TDT. Then it will ask counterfactual questions about what “would” happen if Omega’s own abstract decision procedure gave various answers. Because of the nature of the counterfactuals, these will screen off any actions by player 1 that depend on said answers, even ‘known’ actions.
      
      You’re postulating away the hard part, namely the question of whether the human player’s actions depend on Omega’s real thought processes or if Omega can just fool us!
      - Decius 6 Jun 2013 6:13 UTC
        −1 points
        Parent
        Which strategy is best does not depend on what any given agent decides the ideal strategy is.
        
        I’m assuming only that both the human player and Omega are capable of considering a total of six strategies for a simple payoff matrix and determining which ones are best. In particular, I’m calling Löb’shit on the line of thought “If I can prove that it is best to cooperate, other actors will concur that it is best to cooperate” when used as part of the proof that cooperation is best.
        
        I’m using TDT instead of CDT because I wish to refuse to allow precommitment to become necessary or beneficial, and CDT has trouble explaining why to one-box if the boxes are transparent.