I think W does not have to be a variable which we can observe, i.e. it is not necessarily the case that we can deterministically infer the value of W from the values of X and Y. For example, let’s say the two binary variables we observe are X=[whether smoke is coming out of the kitchen window of a given house] and Y=[whether screams are emanating from the house]. We’d intuitively want to consider a causal model where W=[whether the house is on fire] is causing both, but in a way that makes all triples of variable values have nonzero probability (which is true for these variables in practice). This is impossible if we require W to be deterministic once (X,Y) is known.
I see! You are right, then my argument wasn’t correct! I edited the post partially based on your argument above. New version:
Can we also infer that X causes Y?
Let’s concretize the above graphs by adding the conditional probabilities. Graph 1 then looks like this:
Graph 2 is somewhat trickier, because W is not uniquely determined. But one possibility is like this:
Note that W is just the negation of Z here (W=¬Z). Thus, W and Z are information equivalent, and that means graph 2 is actually just graph 1.
Can we find a different variable W such that graph 2 does not reduce to graph 1? I.e. can we find a variable W such that Z is not deterministic given W?
No, we can’t. To see that, consider the distribution P(Z|X,Y,W). By definition of Z, we know that
P(Z|X,Y,W)={1if Z=X XOR Y,0otherwise..
In other words, P(Z|X,Y,W) is deterministic.
We also know that W d-separates Z from X,Y in graph 2:
This d-separation implies that Z is independent from X, Z given W:
As P(Z|X,Y,W) is deterministic, P(Z|W) also has to be deterministic.
So, graph 2 always reduces to graph 1, no matter how we choose W. Analogously, graph 3 and graph 4 also reduce to graph 1, and we know that our causal structure is graph 1:
I basically agree with this: ruling out unobserved variables is an unusual way to use causal graphical models.
Also, taking the set of variables that are allowed to be in the graph to be the set of variables defined on a given sample space makes the notion of “intervention” more difficult to parse (what happens to F:=(X,Y) after you intervene on X?), though it might be possible with cyclic causal relationships.
So basically, “causal variables” in acyclic graphical models are neither a subset nor a superset of observed random variables.
I think W does not have to be a variable which we can observe, i.e. it is not necessarily the case that we can deterministically infer the value of W from the values of X and Y. For example, let’s say the two binary variables we observe are X=[whether smoke is coming out of the kitchen window of a given house] and Y=[whether screams are emanating from the house]. We’d intuitively want to consider a causal model where W=[whether the house is on fire] is causing both, but in a way that makes all triples of variable values have nonzero probability (which is true for these variables in practice). This is impossible if we require W to be deterministic once (X,Y) is known.
I see! You are right, then my argument wasn’t correct! I edited the post partially based on your argument above. New version:
I basically agree with this: ruling out unobserved variables is an unusual way to use causal graphical models.
Also, taking the set of variables that are allowed to be in the graph to be the set of variables defined on a given sample space makes the notion of “intervention” more difficult to parse (what happens to F:=(X,Y) after you intervene on X?), though it might be possible with cyclic causal relationships.
So basically, “causal variables” in acyclic graphical models are neither a subset nor a superset of observed random variables.