Richard_Ngo comments on The Counterfactual Prisoner’s Dilemma

Richard_Ngo 5 Apr 2021 11:50 UTC
LW: 2 AF: 1
AF
by only considering the branches of reality that are consistent with our knowledge
I know that, in the branch of reality which actually happened, Omega predicted my counterfactual behaviour. I know that my current behaviour is heavily correlated with my counterfactual behaviour. So I know that I can logically cause Omega to give me $10,000. This seems exactly equivalent to Newcomb’s problem, where I can also logically cause Omega to give me a lot of money.
So if by “considering [other branches of reality]” you mean “taking predicted counterfactuals into account when reasoning about logical causation”, then Counterfactual Prisoner’s Dilemma doesn’t give us anything new.
If by “considering [other branches of reality]” you instead mean “acting to benefit my counterfactual self”, then I deny that this is what is happening in CPD. You’re acting to benefit your current self, via logical causation, just like in the Twin Prisoner’s Dilemma. You don’t need to care about your counterfactual self at all. So it’s disanalogous to Counterfactual Mugging, where the only reason to pay is to help your counterfactual self.
- Chris_Leong 5 Apr 2021 13:53 UTC
  LW: 2 AF: 1
  AF Parent
  Hmm… that’s a fascinating argument. I’ve been having trouble figuring out how to respond to you, so I’m thinking that I need to make my argument more precise and then perhaps that’ll help us understand the situation.
  Let’s start from the objection I’ve heard against Counterfactual Mugging. Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
  Now let’s consider Counterfactual Prisoner’s Dilemma. If the coin comes up HEADS, then principle F tells us that the counterfactuals need to have the COIN coming up HEADS as well. However, it doesn’t tell us how to handle the impact of the agent’s policy if they had seen TAILS. I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
  You justify your construction by noting that the agent can figure out that it will make the same decision in both the HEADS and TAILS case. In contrast, my tendency is to exclude information about our decision making procedures. So, if you knew you were a utility maximiser this would typically exclude all but one counterfactual and prevent us saying choice A is better than choice B. Similarly, my tendency here is to suggest that we should be erasing the agent’s self-knowledge of how it decides so that we can imagine the possibility of the agent choosing PAY/NOT PAY or NOT PAY/PAY.
  But I still feel somewhat confused about this situation.
  - Richard_Ngo 6 Apr 2021 9:49 UTC
    LW: 2 AF: 1
    AF Parent
    Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
    The problem is that principle F elides over the difference between facts which are logically caused by your decision, and facts which aren’t. For example, in Parfit’s hitchhiker, my decision not to pay after being picked up logically causes me not to be picked up. The result of that decision would be a counterpossible world: a world in which the same decision algorithm outputs one thing at one point, and a different thing at another point. But in counterfactual mugging, if you choose not to pay, then this doesn’t result in a counterpossible world.
    I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
    The whole point of functional decision theory is that it’s very unlikely for these two policies to differ. For example, consider the Twin Prisoner’s Dilemma, but where the walls of one room are green, and the walls of the other are blue. This shouldn’t make any difference to the outcome: we should still expect both agents to cooperate, or both agents to defect. But the same is true for heads vs tails in Counterfactual Prisoner’s Dilemma—they’re specific details which distinguish you from your counterfactual self, but don’t actually influence any decisions.
    - Chris_Leong 28 Apr 2023 17:52 UTC
      LW: 2 AF: 1
      AF Parent
      So I’ve thought about this argument a bit more and concluded that you are correct, but also that there’s a potential fix to get around this objection.
      I think that it’s quite plausible that an agent will have an understanding of its decision mechanism that a) let’s it know it will take the same action in both counterfactuals b) won’t tell it what action it will take in this counterfactual before it makes the decision.
      And in that case, I think it makes sense to conclude that the Omega’s prediction depends on your action such that paying gives you the $10,000 reward.
      However, there’s a potential fix in that we can construct a non-symmetrical version of this problem where Omega asks you for $200 instead of $100 in the tails case. Then the fact that you would pay in the heads case and combined with making decisions consistently doesn’t automatically imply that you would pay in the tails case. So I suspect that with this fix you actually would have to consider strategies instead of just making a decision purely based on this branch.
    - Chris_Leong 7 Apr 2021 3:14 UTC
      LW: 2 AF: 1
      AF Parent
      “The problem is that principle F elides”—Yeah, I was noting that principle F doesn’t actually get us there and I’d have to assume a principle of independence as well. I’m still trying to think that through.