I don’t understand why 1 is true – in general, couldn’t the variable $W$ be defined on a more refined sample space? Also, I think all $4$ conditions are technically satisfied if you set $W=X$ (or well, maybe it’s better to think of it as a copy of $X$).
I think the following argument works though. Note that the distribution of $X$ given $(Z,Y,W)$ is just the deterministic distribution $X=Y \xor Z$ (this follows from the definition of Z). By the structure of the causal graph, the distribution of $X$ given $(Z,Y,W)$ must be the same as the distribution of $X$ given just $W$. Therefore, the distribution of $X$ given $W$ is deterministic. I strongly guess that a deterministic connection is directly ruled out by one of Pearl’s inference rules.
I agree that 1. is unjustified (and would cause lots of problems for graphical causal models if it was).
Further, I’m pretty sure the result is not “X has to cause Y” but “this distribution has measure 0 WRT lebesgue in models where X does not cause Y” (and deterministic relationships satisfy this)
Finally, you can enable markdown comments on account settings (I believe)
Further, I’m pretty sure the result is not “X has to cause Y” but “this distribution has measure 0 WRT lebesgue in models where X does not cause Y”
Yes that’s true. Going from “The distributions in which X does not cause Y have measure zero” to “X causes Y” is I think common and seems intuitively valid to me. For example the soundness and completeness of d-separation also only holds but for a set of distributions of measure zero.
I think this could be right, but I also think this attitude is a bit too careless. Conditional independence in the first place has lebesgue measure 0. I have some sympathy for considering something along the lines of “when your posterior concentrates on conditional independence, the causal relationships are the ones that don’t concentrate on a priori measure 0 sets” as a definition of causal direction—maybe this is implied by the finite factored set definition if you supply an additional rule for determining priors, I’m not sure.
Also, this is totally not the Pearlian definition! I made it up.
I agree with you regarding 0 lebesgue. My impression is that the Pearl paradigm has some [statistics → causal graph] inference rules which basically do the job of ruling out causal graphs for which having certain properties seen in the data has 0 lebesgue measure. (The inference from two variables being independent to them having no common ancestors in the underlying causal graph, stated earlier in the post, is also of this kind.) So I think it’s correct to say “X has to cause Y”, where this is understood as a valid inference inside the Pearl (or Garrabrant) paradigm. (But also, updating pretty close to “X has to cause Y” is correct for a Bayesian with reasonable priors about the underlying causal graphs.)
(epistemic position: I haven’t read most of the relevant material in much detail)
I agree that 1. is unjustified (and would cause lots of problems for graphical causal models if it was).
Interesting, why is that? For any of the outcomes (i.e. 00, 01, 10, and 11), P(W|X,Y) is either 0 or 1 for any variable W that we can observe. So W is deterministic given X and Y for our purposes, right?
If not, do you have an example for a variable W where that’s not the case?
I think W does not have to be a variable which we can observe, i.e. it is not necessarily the case that we can deterministically infer the value of W from the values of X and Y. For example, let’s say the two binary variables we observe are X=[whether smoke is coming out of the kitchen window of a given house] and Y=[whether screams are emanating from the house]. We’d intuitively want to consider a causal model where W=[whether the house is on fire] is causing both, but in a way that makes all triples of variable values have nonzero probability (which is true for these variables in practice). This is impossible if we require W to be deterministic once (X,Y) is known.
I see! You are right, then my argument wasn’t correct! I edited the post partially based on your argument above. New version:
Can we also infer that X causes Y?
Let’s concretize the above graphs by adding the conditional probabilities. Graph 1 then looks like this:
Graph 2 is somewhat trickier, because W is not uniquely determined. But one possibility is like this:
Note that W is just the negation of Z here (W=¬Z). Thus, W and Z are information equivalent, and that means graph 2 is actually just graph 1.
Can we find a different variable W such that graph 2 does not reduce to graph 1? I.e. can we find a variable W such that Z is not deterministic given W?
No, we can’t. To see that, consider the distribution P(Z|X,Y,W). By definition of Z, we know that
P(Z|X,Y,W)={1if Z=X XOR Y,0otherwise..
In other words, P(Z|X,Y,W) is deterministic.
We also know that W d-separates Z from X,Y in graph 2:
This d-separation implies that Z is independent from X, Z given W:
P(Z|W)=P(Z|W,X,Y)
As P(Z|X,Y,W) is deterministic, P(Z|W) also has to be deterministic.
So, graph 2 always reduces to graph 1, no matter how we choose W. Analogously, graph 3 and graph 4 also reduce to graph 1, and we know that our causal structure is graph 1:
I basically agree with this: ruling out unobserved variables is an unusual way to use causal graphical models.
Also, taking the set of variables that are allowed to be in the graph to be the set of variables defined on a given sample space makes the notion of “intervention” more difficult to parse (what happens to F:=(X,Y) after you intervene on X?), though it might be possible with cyclic causal relationships.
So basically, “causal variables” in acyclic graphical models are neither a subset nor a superset of observed random variables.
I don’t understand why 1 is true – in general, couldn’t the variable $W$ be defined on a more refined sample space? Also, I think all $4$ conditions are technically satisfied if you set $W=X$ (or well, maybe it’s better to think of it as a copy of $X$).
I think the following argument works though. Note that the distribution of $X$ given $(Z,Y,W)$ is just the deterministic distribution $X=Y \xor Z$ (this follows from the definition of Z). By the structure of the causal graph, the distribution of $X$ given $(Z,Y,W)$ must be the same as the distribution of $X$ given just $W$. Therefore, the distribution of $X$ given $W$ is deterministic. I strongly guess that a deterministic connection is directly ruled out by one of Pearl’s inference rules.
The same argument also rules out graphs 2 and 4.
I agree that 1. is unjustified (and would cause lots of problems for graphical causal models if it was).
Further, I’m pretty sure the result is not “X has to cause Y” but “this distribution has measure 0 WRT lebesgue in models where X does not cause Y” (and deterministic relationships satisfy this)
Finally, you can enable markdown comments on account settings (I believe)
Yes that’s true. Going from “The distributions in which X does not cause Y have measure zero” to “X causes Y” is I think common and seems intuitively valid to me. For example the soundness and completeness of d-separation also only holds but for a set of distributions of measure zero.
I think this could be right, but I also think this attitude is a bit too careless. Conditional independence in the first place has lebesgue measure 0. I have some sympathy for considering something along the lines of “when your posterior concentrates on conditional independence, the causal relationships are the ones that don’t concentrate on a priori measure 0 sets” as a definition of causal direction—maybe this is implied by the finite factored set definition if you supply an additional rule for determining priors, I’m not sure.
Also, this is totally not the Pearlian definition! I made it up.
I agree with you regarding 0 lebesgue. My impression is that the Pearl paradigm has some [statistics → causal graph] inference rules which basically do the job of ruling out causal graphs for which having certain properties seen in the data has 0 lebesgue measure. (The inference from two variables being independent to them having no common ancestors in the underlying causal graph, stated earlier in the post, is also of this kind.) So I think it’s correct to say “X has to cause Y”, where this is understood as a valid inference inside the Pearl (or Garrabrant) paradigm. (But also, updating pretty close to “X has to cause Y” is correct for a Bayesian with reasonable priors about the underlying causal graphs.)
(epistemic position: I haven’t read most of the relevant material in much detail)
Interesting, why is that? For any of the outcomes (i.e. 00, 01, 10, and 11), P(W|X,Y) is either 0 or 1 for any variable W that we can observe. So W is deterministic given X and Y for our purposes, right?
If not, do you have an example for a variable W where that’s not the case?
I think W does not have to be a variable which we can observe, i.e. it is not necessarily the case that we can deterministically infer the value of W from the values of X and Y. For example, let’s say the two binary variables we observe are X=[whether smoke is coming out of the kitchen window of a given house] and Y=[whether screams are emanating from the house]. We’d intuitively want to consider a causal model where W=[whether the house is on fire] is causing both, but in a way that makes all triples of variable values have nonzero probability (which is true for these variables in practice). This is impossible if we require W to be deterministic once (X,Y) is known.
I see! You are right, then my argument wasn’t correct! I edited the post partially based on your argument above. New version:
I basically agree with this: ruling out unobserved variables is an unusual way to use causal graphical models.
Also, taking the set of variables that are allowed to be in the graph to be the set of variables defined on a given sample space makes the notion of “intervention” more difficult to parse (what happens to F:=(X,Y) after you intervene on X?), though it might be possible with cyclic causal relationships.
So basically, “causal variables” in acyclic graphical models are neither a subset nor a superset of observed random variables.