We’ll refer to these as “Bookkeeping Rules”, since they feel pretty minor if you’re already comfortable working with Bayes nets. Some examples:
We can always add an arrow to a diagram (assuming it doesn’t introduce a loop), and the approximation will get no worse.
Here’s something that’s kept bothering me on and off for the last few months: This graphical rule immediately breaks Markov equivalence. Specifically, two DAGs are Markov-equivalent only if they share an (undirected) skeleton. (Lemma 6.1 at the link.)
If the major/only thing we care about here regarding latential Bayes nets is that our Grand Joint Distribution P[XG] factorize over (that is, satisfy) our DAG G (and all of the DAGs we can get from it by applying the rules here), then by Thm 6.2 in the link above, P is also globally/locally Markov wrt G. This holds even when P[XG]>0 is not guaranteed for some of the possible joint states in XG, unlike Hammersley-Clifford would require.
That in turn means that (Def 6.5) there’s some distributions P can be such that P[XG] factors over G, but not G′=G+anonloopyextraarrow (where G′ trivially has the same vertices as G does); specifically, becauseG,G′ don’t (quite) share a skeleton, they can’t be Markov-equivalent, and because they aren’t Markov-equivalent, P no longer needs to be (locally/globally) Markov wrt G′ (and in fact there must exist some P which explicitly break this), and because of that, such P need not factor over G′. Which I claim we should not want here, because (as always) we care primarily about preserving which joint probability distributions factorize over/satisfy which DAGs, and of course we probably don’t get to pick whether ourP is one of the ones where that break in the chain of logic matters.
A way I’d phrase John’s sibling comment, at least for the exact case: adding arrows to a DAG increases the set of probability distributions it can represent. This is because the fundamental rule of a Bayes net is that d-separation has to imply conditional independence—but you can have conditional independences in a distribution that aren’t represented by a network. When you add arrows, you can remove instances of d-separation, but you can’t add any (because nodes are d-separated when all paths between them satisfy some property, and (a) adding arrows can only increase the number of paths you have to worry about and (b) if you look at the definition of d-separation the relevant properties for paths get harder to satisfy when you have more arrows). Therefore, the more arrows a graph G has, the fewer constraints distribution P has to satisfy for P to be represented by G.
Proof that the quoted bookkeeping rule works, for the exact case:
The original DAG G asserts P[X]=∏iP[Xi|XpaG(i)]
If G′ just adds an edge from j to k, then G′ says P[X]=P[Xk|XpaG(k),Xj]∏i≠kP[Xi|XpaG(i)]
The original DAG’s assertion P[X]=∏iP[Xi|XpaG(i)] also implies P[Xk|XpaG(k),Xj]=P[Xk|XpaG(k)], and therefore implies G′’s assertion P[X]=P[Xk|XpaG(k),Xj]∏i≠kP[Xi|XpaG(i)].
The approximate case then follows by the new-and-improved Bookkeeping Theorem.
Let me see if I’ve understood point 3 correctly here. (I am not convinced I have actually found a flaw, I’m just trying to reconcile two things in my head here that look to conflict, so I can write down a clean definition elsewhere of something that matters to me.)
P factors over G. In G, Xj,Xk were conditionally independent of each other, given XpaG(k). Because P factors over G and because in G, Xj,Xk were conditionally independent of each other, given XpaG(k), we can very straightforwardly show that P factors over G′, too. This is the stuff you said above, right?
But if we go the other direction, assuming that some arbitrary P′ factors over G′, I don’t think that we can then still derive that P′ factors over G in full generality, which was what worried me. But that break of symmetry (and thus lack of equivalence) is… genuinely probably fine, actually—there’s no rule for arbitrarily deleting arrows, after all.
Here’s something that’s kept bothering me on and off for the last few months: This graphical rule immediately breaks Markov equivalence. Specifically, two DAGs are Markov-equivalent only if they share an (undirected) skeleton. (Lemma 6.1 at the link.)
If the major/only thing we care about here regarding latential Bayes nets is that our Grand Joint Distribution P[XG] factorize over (that is, satisfy) our DAG G (and all of the DAGs we can get from it by applying the rules here), then by Thm 6.2 in the link above, P is also globally/locally Markov wrt G. This holds even when P[XG]>0 is not guaranteed for some of the possible joint states in XG, unlike Hammersley-Clifford would require.
That in turn means that (Def 6.5) there’s some distributions P can be such that P[XG] factors over G, but not G′=G+anonloopyextraarrow (where G′ trivially has the same vertices as G does); specifically, because G,G′ don’t (quite) share a skeleton, they can’t be Markov-equivalent, and because they aren’t Markov-equivalent, P no longer needs to be (locally/globally) Markov wrt G′ (and in fact there must exist some P which explicitly break this), and because of that, such P need not factor over G′. Which I claim we should not want here, because (as always) we care primarily about preserving which joint probability distributions factorize over/satisfy which DAGs, and of course we probably don’t get to pick whether our P is one of the ones where that break in the chain of logic matters.
A way I’d phrase John’s sibling comment, at least for the exact case: adding arrows to a DAG increases the set of probability distributions it can represent. This is because the fundamental rule of a Bayes net is that d-separation has to imply conditional independence—but you can have conditional independences in a distribution that aren’t represented by a network. When you add arrows, you can remove instances of d-separation, but you can’t add any (because nodes are d-separated when all paths between them satisfy some property, and (a) adding arrows can only increase the number of paths you have to worry about and (b) if you look at the definition of d-separation the relevant properties for paths get harder to satisfy when you have more arrows). Therefore, the more arrows a graph G has, the fewer constraints distribution P has to satisfy for P to be represented by G.
Proof that the quoted bookkeeping rule works, for the exact case:
The original DAG G asserts P[X]=∏iP[Xi|XpaG(i)]
If G′ just adds an edge from j to k, then G′ says P[X]=P[Xk|XpaG(k),Xj]∏i≠kP[Xi|XpaG(i)]
The original DAG’s assertion P[X]=∏iP[Xi|XpaG(i)] also implies P[Xk|XpaG(k),Xj]=P[Xk|XpaG(k)], and therefore implies G′’s assertion P[X]=P[Xk|XpaG(k),Xj]∏i≠kP[Xi|XpaG(i)].
The approximate case then follows by the new-and-improved Bookkeeping Theorem.
Not sure where the disconnect/confusion is.
Let me see if I’ve understood point 3 correctly here. (I am not convinced I have actually found a flaw, I’m just trying to reconcile two things in my head here that look to conflict, so I can write down a clean definition elsewhere of something that matters to me.)
P factors over G. In G, Xj,Xk were conditionally independent of each other, given XpaG(k). Because P factors over G and because in G, Xj,Xk were conditionally independent of each other, given XpaG(k), we can very straightforwardly show that P factors over G′, too. This is the stuff you said above, right?
But if we go the other direction, assuming that some arbitrary P′ factors over G′, I don’t think that we can then still derive that P′ factors over G in full generality, which was what worried me. But that break of symmetry (and thus lack of equivalence) is… genuinely probably fine, actually—there’s no rule for arbitrarily deleting arrows, after all.
That’s cleared up my confusion/worries, thanks!