Take my wet lawn—it could either be wet because it’s raining, or because I’m watering it—and suppose for simplicity that both of these have a base rate of P=0.5. The causal diagram is Rain → Wetness ← Watering
Then our non-causal information looks like: rain and noRain are MEE and have been observed in 1:2 ratio in the past, watering and notWatering are MEE and have been observed in 1:2 ratio in the past, wet and notWet are MEE, wet and noRain and notWatering are inconsistent, notWet and rain are inconsistent, notWet and watering are inconsistent.
If we call rain / noRain by the letters R / r, watering / not as Wa / wa, and wetness as We / we, we have the following possible MEE events: RWaWe, rWaWe, RwaWe, rwawe.
Following the recipe in the post, our causal information then fixes P(Wa)=P(R)=1/3, and nothing else.
Given these constraints, we have P(RWaWe)=1/6, P(RwaWe)=1/6, P(rWaWe)=1/6, P(rwawe)=1/2.
Uh oh, we have a problem! Our causal information should also be telling us that rain and watering are independent—P(RWa) = P(R)P(Wa). What have I done wrong, and how can I do it right rather than just patching a hole?
The obvious patch is just to say that everything that is d-separated is independent. And if we have the correct prior probability distribution, then conditionalization works properly at handling changing d-separation.
But it’s not clear that there are no more problems—and I’m not even sure of a good foothold to attack this more abstractly.
I’m pretty sure the solution here is just to assume that the usual iterative procedure is correct. If something can be proven equivalent to that, it works. Even if the d-separation independence thing is just a patch, the correct solution probably won’t need many patches because the iterative procedure is simple.
Problem 2
If I find a causal prior and then make observations, my updates can change my probabilities for various parent nodes. E.g. in the marble game, if I condition on a white marble, my probability of Heads changes. But shouldn’t conditioning be equivalent to just adding the information you’re conditioning on to your pool of information, and rederiving from scratch? And yet if we follow the procedure above, the parent node’s probability is totally fixed. What gives?
This actually works if you condition every probability, including the probability of the parent nodes, on the observed information. For example, say that option one is you could start with all options possible in the marble game and then observe that the result was not Heads and White. And option two is you could determine the marble color causally, in a way that never even has the possibility of White when Heads. And these two options result in different probabilities.
This really reinforces how the information about how a node’s value is causally generated is different from observed information about that node.
Problem 1
Take my wet lawn—it could either be wet because it’s raining, or because I’m watering it—and suppose for simplicity that both of these have a base rate of P=0.5. The causal diagram is Rain → Wetness ← Watering
Then our non-causal information looks like: rain and noRain are MEE and have been observed in 1:2 ratio in the past, watering and notWatering are MEE and have been observed in 1:2 ratio in the past, wet and notWet are MEE, wet and noRain and notWatering are inconsistent, notWet and rain are inconsistent, notWet and watering are inconsistent.
If we call rain / noRain by the letters R / r, watering / not as Wa / wa, and wetness as We / we, we have the following possible MEE events: RWaWe, rWaWe, RwaWe, rwawe.
Following the recipe in the post, our causal information then fixes P(Wa)=P(R)=1/3, and nothing else.
Given these constraints, we have P(RWaWe)=1/6, P(RwaWe)=1/6, P(rWaWe)=1/6, P(rwawe)=1/2.
Uh oh, we have a problem! Our causal information should also be telling us that rain and watering are independent—P(RWa) = P(R)P(Wa). What have I done wrong, and how can I do it right rather than just patching a hole?
The obvious patch is just to say that everything that is d-separated is independent. And if we have the correct prior probability distribution, then conditionalization works properly at handling changing d-separation.
But it’s not clear that there are no more problems—and I’m not even sure of a good foothold to attack this more abstractly.
I’m pretty sure the solution here is just to assume that the usual iterative procedure is correct. If something can be proven equivalent to that, it works. Even if the d-separation independence thing is just a patch, the correct solution probably won’t need many patches because the iterative procedure is simple.
Problem 2
If I find a causal prior and then make observations, my updates can change my probabilities for various parent nodes. E.g. in the marble game, if I condition on a white marble, my probability of Heads changes. But shouldn’t conditioning be equivalent to just adding the information you’re conditioning on to your pool of information, and rederiving from scratch? And yet if we follow the procedure above, the parent node’s probability is totally fixed. What gives?
This actually works if you condition every probability, including the probability of the parent nodes, on the observed information. For example, say that option one is you could start with all options possible in the marble game and then observe that the result was not Heads and White. And option two is you could determine the marble color causally, in a way that never even has the possibility of White when Heads. And these two options result in different probabilities.
This really reinforces how the information about how a node’s value is causally generated is different from observed information about that node.