Richard_Ngo comments on The Counterfactual Prisoner’s Dilemma

Richard_Ngo 4 Apr 2021 14:10 UTC
LW: 4 AF: 1
0
AF
I don’t see why the Counterfactual Prisoner’s Dilemma persuades you to pay in the Counterfactual Mugging case. In the counterfactual prisoner’s dilemma, I pay because that action logically causes Omega to give me $10,000 in the real world (via influencing the counterfactual). This doesn’t require shifting the locus of evaluation to policies, as long as we have a good theory of which actions are correlated with which other actions (e.g. paying in heads-world and paying in tails-world).
In the counterfactual mugging, by contrast, the whole point is that paying doesn’t cause any positive effects in the real world. So it seems perfectly consistent to pay in the counterfactual prisoner’s dilemma, but not in the counterfactual mugging.
- Chris_Leong 5 Apr 2021 1:36 UTC
  LW: 2 AF: 1
  0
  AF Parent
  You’re correct that paying in Counterfactual Prisoner’s Dilemma doesn’t necessarily commit you to paying in Counterfactual Mugging.
  However, it does appear to provide a counter-example to the claim that we ought to adopt the principle of making decisions by only considering the branches of reality that are consistent with our knowledge as this would result in us refusing to pay in Counterfactual Prisoner’s Dilemma regardless of the coin flip result.
  (Interestingly enough, blackmail problems seem to also demonstrate that this principle is flawed as well).
  This seems to suggest that we need to consider policies rather than completely separate decisions for each possible branch of reality. And while, as I already noted, this doesn’t get us all the way, it does make the argument for paying much more compelling by defeating the strongest objection.
  - Richard_Ngo 5 Apr 2021 11:50 UTC
    LW: 2 AF: 1
    0
    AF Parent
    by only considering the branches of reality that are consistent with our knowledge
    I know that, in the branch of reality which actually happened, Omega predicted my counterfactual behaviour. I know that my current behaviour is heavily correlated with my counterfactual behaviour. So I know that I can logically cause Omega to give me $10,000. This seems exactly equivalent to Newcomb’s problem, where I can also logically cause Omega to give me a lot of money.
    So if by “considering [other branches of reality]” you mean “taking predicted counterfactuals into account when reasoning about logical causation”, then Counterfactual Prisoner’s Dilemma doesn’t give us anything new.
    If by “considering [other branches of reality]” you instead mean “acting to benefit my counterfactual self”, then I deny that this is what is happening in CPD. You’re acting to benefit your current self, via logical causation, just like in the Twin Prisoner’s Dilemma. You don’t need to care about your counterfactual self at all. So it’s disanalogous to Counterfactual Mugging, where the only reason to pay is to help your counterfactual self.
    - Chris_Leong 5 Apr 2021 13:53 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Hmm… that’s a fascinating argument. I’ve been having trouble figuring out how to respond to you, so I’m thinking that I need to make my argument more precise and then perhaps that’ll help us understand the situation.
      Let’s start from the objection I’ve heard against Counterfactual Mugging. Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
      Now let’s consider Counterfactual Prisoner’s Dilemma. If the coin comes up HEADS, then principle F tells us that the counterfactuals need to have the COIN coming up HEADS as well. However, it doesn’t tell us how to handle the impact of the agent’s policy if they had seen TAILS. I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
      You justify your construction by noting that the agent can figure out that it will make the same decision in both the HEADS and TAILS case. In contrast, my tendency is to exclude information about our decision making procedures. So, if you knew you were a utility maximiser this would typically exclude all but one counterfactual and prevent us saying choice A is better than choice B. Similarly, my tendency here is to suggest that we should be erasing the agent’s self-knowledge of how it decides so that we can imagine the possibility of the agent choosing PAY/NOT PAY or NOT PAY/PAY.
      But I still feel somewhat confused about this situation.
      - Richard_Ngo 6 Apr 2021 9:49 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
        The problem is that principle F elides over the difference between facts which are logically caused by your decision, and facts which aren’t. For example, in Parfit’s hitchhiker, my decision not to pay after being picked up logically causes me not to be picked up. The result of that decision would be a counterpossible world: a world in which the same decision algorithm outputs one thing at one point, and a different thing at another point. But in counterfactual mugging, if you choose not to pay, then this doesn’t result in a counterpossible world.
        I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
        The whole point of functional decision theory is that it’s very unlikely for these two policies to differ. For example, consider the Twin Prisoner’s Dilemma, but where the walls of one room are green, and the walls of the other are blue. This shouldn’t make any difference to the outcome: we should still expect both agents to cooperate, or both agents to defect. But the same is true for heads vs tails in Counterfactual Prisoner’s Dilemma—they’re specific details which distinguish you from your counterfactual self, but don’t actually influence any decisions.
        Chris_Leong 28 Apr 2023 17:52 UTC
        LW: 2 AF: 1
        0
        AF Parent
        So I’ve thought about this argument a bit more and concluded that you are correct, but also that there’s a potential fix to get around this objection.
        I think that it’s quite plausible that an agent will have an understanding of its decision mechanism that a) let’s it know it will take the same action in both counterfactuals b) won’t tell it what action it will take in this counterfactual before it makes the decision.
        And in that case, I think it makes sense to conclude that the Omega’s prediction depends on your action such that paying gives you the $10,000 reward.
        However, there’s a potential fix in that we can construct a non-symmetrical version of this problem where Omega asks you for $200 instead of $100 in the tails case. Then the fact that you would pay in the heads case and combined with making decisions consistently doesn’t automatically imply that you would pay in the tails case. So I suspect that with this fix you actually would have to consider strategies instead of just making a decision purely based on this branch.
        Chris_Leong 7 Apr 2021 3:14 UTC
        LW: 2 AF: 1
        0
        AF Parent
        “The problem is that principle F elides”—Yeah, I was noting that principle F doesn’t actually get us there and I’d have to assume a principle of independence as well. I’m still trying to think that through.