First, there’s a different type signature. PC or FCI-style causal structure learning typically takes in a bunch of conditional independence tests (or a function for performing conditional independence tests, or a pile of data on which to perform conditional independence tests, etc) and spits out a DAG. For this post, it’s DAGs in, DAGs out.
Second, a different use-case. The theorems in this post all say something like “if the distribution (approximately) factors according to <some DAGs>, then it also (approximately) factors according to <some other DAGs>”. This use-case doesn’t directly assume or prove anything about minimality; none of the theorems say that the distribution can’t also satisfy some DAG with fewer edges. Likewise, we neither assume nor prove anything about faithfulness. (Though one could perhaps use these theorems to prove minimality and/or faithfulness in particular cases, in which case the preconditions for the proof would be that the distribution does/doesn’t satisfy some DAGs.)
If we wanted to use these theorems for causal structure learning, at a high level the method would look something like:
First, find some DAGs the distribution satisfies via some other method(s).
Then, use the theorems here to derive a stronger DAG which must be satisfied.
For instance, we could imagine that two groups of scientists do a bunch of experiments on the same system. One group finds that the system is well-modelled by the left DAG at the top of the Frankenstein Rule section of this post, the other group finds that the system is well-modelled by the right DAG. Then, we could try to Frankenstein those two DAGs together in a way which keeps as few edges as possible.
The theorems in this post all say something like “if the distribution (approximately) factors according to <some DAGs>, then it also (approximately) factors according to <some other DAGs>”
So one motivating research question might be phrased as “Probability distributions have an equivalence class of Bayes nets / causal diagrams which are all compatible. But what is the structure within a given equivalence class? In particular, if we have a representative Bayes net of an equivalence class, how might we algorithmically generate other Bayes nets in that equivlance class?”
also, it appears that the two diagrams in the Frankenstein Rule section differ in their d-separation of (x_1 \indep x_4 | x_5) (which doesn’t hold in the the left), so these are not actually equivalent (we can’t have an underlying distribution satisfy both of these diagrams)
Oh, we can totally have an underlying distribution satisfy both of these diagrams. The key is that, while the right diagram asserts (x_1 \indep x_4 | x_5), the left does not say that x_1 can’t be independent of x_4 given x_5. Remember the interpretation: an underlying distribution satisfies a DAG if-and-only-if the distribution factors over that DAG. We neither assume nor prove minimality; the DAG does not need to be minimal.
So, for instance, a distribution in which all five variables are unconditionally independent would satisfy every diagram over those variables.
You are right that the two diagrams are not equivalent (i.e. there exist distributions which satisfy either one but not the other), and we’re not claiming they’re equivalent. We’re just saying “assume that some distribution satisfies both of these two diagrams; what other diagrams must the distribution then satisfy?”.
Could you clarify how this relates to e.g. the PC (Peter-Clark) or FCI (Fast Causal Inference) algorithms for causal structure learning?
Like, are you making different assumptions (than e.g. minimality, faithfulness, etc)?
First, there’s a different type signature. PC or FCI-style causal structure learning typically takes in a bunch of conditional independence tests (or a function for performing conditional independence tests, or a pile of data on which to perform conditional independence tests, etc) and spits out a DAG. For this post, it’s DAGs in, DAGs out.
Second, a different use-case. The theorems in this post all say something like “if the distribution (approximately) factors according to <some DAGs>, then it also (approximately) factors according to <some other DAGs>”. This use-case doesn’t directly assume or prove anything about minimality; none of the theorems say that the distribution can’t also satisfy some DAG with fewer edges. Likewise, we neither assume nor prove anything about faithfulness. (Though one could perhaps use these theorems to prove minimality and/or faithfulness in particular cases, in which case the preconditions for the proof would be that the distribution does/doesn’t satisfy some DAGs.)
If we wanted to use these theorems for causal structure learning, at a high level the method would look something like:
First, find some DAGs the distribution satisfies via some other method(s).
Then, use the theorems here to derive a stronger DAG which must be satisfied.
For instance, we could imagine that two groups of scientists do a bunch of experiments on the same system. One group finds that the system is well-modelled by the left DAG at the top of the Frankenstein Rule section of this post, the other group finds that the system is well-modelled by the right DAG. Then, we could try to Frankenstein those two DAGs together in a way which keeps as few edges as possible.
So one motivating research question might be phrased as “Probability distributions have an equivalence class of Bayes nets / causal diagrams which are all compatible. But what is the structure within a given equivalence class? In particular, if we have a representative Bayes net of an equivalence class, how might we algorithmically generate other Bayes nets in that equivlance class?”
also, it appears that the two diagrams in the Frankenstein Rule section differ in their d-separation of (x_1 \indep x_4 | x_5) (which doesn’t hold in the the left), so these are not actually equivalent (we can’t have an underlying distribution satisfy both of these diagrams)
Oh, we can totally have an underlying distribution satisfy both of these diagrams. The key is that, while the right diagram asserts (x_1 \indep x_4 | x_5), the left does not say that x_1 can’t be independent of x_4 given x_5. Remember the interpretation: an underlying distribution satisfies a DAG if-and-only-if the distribution factors over that DAG. We neither assume nor prove minimality; the DAG does not need to be minimal.
So, for instance, a distribution in which all five variables are unconditionally independent would satisfy every diagram over those variables.
You are right that the two diagrams are not equivalent (i.e. there exist distributions which satisfy either one but not the other), and we’re not claiming they’re equivalent. We’re just saying “assume that some distribution satisfies both of these two diagrams; what other diagrams must the distribution then satisfy?”.
Ah that’s right. Thanks that example is quite clarifying!