Richard_Ngo comments on The Counterfactual Prisoner’s Dilemma

Richard_Ngo Apr 6, 2021, 9:49 AM
LW: 2 AF: 1
AF
Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
The problem is that principle F elides over the difference between facts which are logically caused by your decision, and facts which aren’t. For example, in Parfit’s hitchhiker, my decision not to pay after being picked up logically causes me not to be picked up. The result of that decision would be a counterpossible world: a world in which the same decision algorithm outputs one thing at one point, and a different thing at another point. But in counterfactual mugging, if you choose not to pay, then this doesn’t result in a counterpossible world.
I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
The whole point of functional decision theory is that it’s very unlikely for these two policies to differ. For example, consider the Twin Prisoner’s Dilemma, but where the walls of one room are green, and the walls of the other are blue. This shouldn’t make any difference to the outcome: we should still expect both agents to cooperate, or both agents to defect. But the same is true for heads vs tails in Counterfactual Prisoner’s Dilemma—they’re specific details which distinguish you from your counterfactual self, but don’t actually influence any decisions.
- Chris_Leong Apr 28, 2023, 5:52 PM
  LW: 2 AF: 1
  AF Parent
  So I’ve thought about this argument a bit more and concluded that you are correct, but also that there’s a potential fix to get around this objection.
  I think that it’s quite plausible that an agent will have an understanding of its decision mechanism that a) let’s it know it will take the same action in both counterfactuals b) won’t tell it what action it will take in this counterfactual before it makes the decision.
  And in that case, I think it makes sense to conclude that the Omega’s prediction depends on your action such that paying gives you the $10,000 reward.
  However, there’s a potential fix in that we can construct a non-symmetrical version of this problem where Omega asks you for $200 instead of $100 in the tails case. Then the fact that you would pay in the heads case and combined with making decisions consistently doesn’t automatically imply that you would pay in the tails case. So I suspect that with this fix you actually would have to consider strategies instead of just making a decision purely based on this branch.
- Chris_Leong Apr 7, 2021, 3:14 AM
  LW: 2 AF: 1
  AF Parent
  “The problem is that principle F elides”—Yeah, I was noting that principle F doesn’t actually get us there and I’d have to assume a principle of independence as well. I’m still trying to think that through.