Say there’s some logical random variable O you’re going to learn, which is either 0 or 1, with a prior 50% probability of being 1. After knowing the value of this variable, you take action 0 or 1. Some predictor doesn’t know the value of this variable, but does know your source code. This predictor predicts P(you take action 1 | O = 0) and P(you take action 1 | O = 1). Your utility only depends on these predictions; specifically, it is P(you take action 1 | O = 0) − 100(P(you take action 1 | O = 0)-P(you take action 1 | O = 1))^2.
This is a continuous coordination problem, and CDT-like graph intervention isn’t guaranteed to solve it, while policy selection is.
Cool. I hadn’t thought to frame those problems in predictor terms, and I agree now that “only matters in multi-agent dilemmas” is incorrect.
That said, it still seems to me like policy selection only matters in situations where, conceptually, winning requires something like multiple agents who run the same decision algorithm meeting and doing a bit of logically-prior coordination, and something kind of like this separates things like transparent Newcomb’s problem (where policy selection is not necessary) from the more coordination-shaped cases. The way the problems are classified in my head still involves me asking myself the question “well, do I need to get together and coordinate with all of the instances of me that appear in the problem logically-beforehand, or can we each individually wing it once we see our observations?”.
If anyone has examples where this classification is broken, I remain curious to hear them. Or, similar question: is there any disagreement on the weakened claim, “policy selection only matters in situations that can be transformed into multi-agent problems, where a problem is said to be ‘multi-agent’ if the winning strategy requires the agents to coordinate logically-before making their observations”?
Say there’s some logical random variable O you’re going to learn, which is either 0 or 1, with a prior 50% probability of being 1. After knowing the value of this variable, you take action 0 or 1. Some predictor doesn’t know the value of this variable, but does know your source code. This predictor predicts P(you take action 1 | O = 0) and P(you take action 1 | O = 1). Your utility only depends on these predictions; specifically, it is P(you take action 1 | O = 0) − 100(P(you take action 1 | O = 0)-P(you take action 1 | O = 1))^2.
This is a continuous coordination problem, and CDT-like graph intervention isn’t guaranteed to solve it, while policy selection is.
Nate:
[EDIT: retracted]