This post assumes knowledge of category theory, finite factored sets, and Bayes nets.
The Setup
I’ve already talked about DAGs and factor overlap Venn diagrams in a previous post, where I studied them within a category-theoretic framework. Here I’ll also perform an explicit construction of them using set theory.
DAGs
I have already discussed the set of directed acyclic graphs over n elements. We will denote the set of all DAGs of n elements as DAG(n). Each Bayes net can be thought of as a set of pairs of elements {(i∈{1,...,n},j≠i∈{1,...,n}),...}.
This set can be converted into the category of Bayes nets over n elements by the addition of morphisms corresponding to “bookkeeping”-type relationships, which we will denoteDAGn. From this category, we can form a category whose elements are sets of the elements of DAGn, subject to the following condition on a set S:
(I’ll slightly go against standard notation by using calligraphic acronyms for my category names i.e. DAGn. I don’t feel like any of my categories are definitely natural or useful enough to earn a “proper” bold name like DAGn)
∀s∈S,b∈DAG(n),s→b⟹b∈S
This means that, if we can reach a given Bayes net b from any element s of our set S, that Bayes net b must also be in S. I refer to these as compatible sets and denote the category CSBn (standing for compatible sets of Bayes nets) as the category whose elements are compatible sets of n elements and for any two elements. To write it out fully:
This should be read as “There is a unique morphism from A to B if and only if A is a weak superset of B, otherwise there is no morphism”. Orderings of this form always follow the rules required to create a category.
(Aside: sometimes we think about equivalence classes of Bayes nets. If we choose, we can first convert our Bayes nets to equivalence classes, then convert them to compatible sets, but this is not needed here)
This category has an initial object Disn which is a set consisting of the discrete Bayes net (with no arrows at all) and therefore all Bayes nets. Less trivially it has a terminal object Indn which contains exactly the Bayes nets which have an arrow between every pair of objects.
Venn diagrams
Presence/absence Venn diagrams of n elements can be thought of as elements of the power-set of the power-set of the set {1,...,n}. We can write this as P(P({1,...,n})), but I will abbreviate this by writing PP(n) in future. As before, we can form a category by ordering the sets, but in this case we will reverse the order of the morphisms, to create the category FVn (standing for factor Venn diagrams).
Ob(FVn)=PP(n)
FVn(A,B)={{→}A⊆B{}A⊃B
This category has an initial object {} and a terminal object P(1,...,n).
Our categories look (something) like this:
The Payoff
Functors Fn and Gn
There exist functors FVnFn⇄GnCSBn which are naturally-defined with respect to the properties of joint probability distributions. There exist subcategories FV′n and CSB′n on which the functors Fn and Gn are totally inverse. The resulting category structure—I hope—will be useful for causal inference.
We will define compatibility Cmp(a,b) between a set of elements a⊆{1,...,n} and a Bayes net b∈DAG(n) as follows: there must exist some node ai∈(a↪b) (the hook arrow ↪ denotes the “inclusion” of the elements of a into b, so it refers to the nodes in bwhich are elements of a) such that all other nodes a¯i∈(a↪b) are reachable via paths including only elements of a↪b.
Fn:FVn→CSBn maps an object V∈Ob(FVn) is mapped to an object S∈Ob(CSBn)according to the following rule: a Bayes net s∈DAG(n) is present in S if and only if Cmp(v,s) holds for every set v∈V.
We will quickly verify that this respects morphisms. Any W such that V→W must contain at least all of the sets present in V (since V→W⟺V⊆W) and possibly more. Therefore, any r∈R=Fn(W) must also follow all of the constraints imposed by Fn(V) and possibly more, so S⊇R⟺S→R.
Next, we wish to define Gn. This requires a little finesse but isn’t too difficult. We shall say that Gn:CSBn→FVn maps an object S∈Ob(CSBn) to an object V∈Ob(FVn) according to the following rule: a set v is present in V if and only if Cmp(v,s) holds for every Bayes net net s∈S.
You may notice that this is almost the reverse of Fn. This is by design. It’s worth pausing for a moment to consider how they differ: Fn maps a factor overlap Venn diagram, constructed as a set of sets of items from {1,...,n}, to a set of DAGs. We can think of it as starting with the complete set of DAGs, then going through our venn diagram V, and for each v∈V throwing out the DAGs which don’t conform.
Conversely Gn maps a set of DAGs to a set of factor overlaps. We can think of it as starting with the completely full factor overlap Venn diagram, and for each DAG s∈S, it throws out the factors which don’t apply.
Quick Examples
Our functors are not quite inverses: for an example consider the Venn diagram corresponding to the set {{},{1},{2},{1,2},{1,3},{2,3}} in FV3. This is mapped by F3 to exactly the set of totally connected DAGs, which is the terminal object in CSB3. The terminal object in CSB3 is mapped by G3 to the terminal object {{},{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}} in FV3. This corresponds to what I called “agnosticism” in our set of Venn diagrams in the last post.
Note that all objects V∈Ob(FVn) such that V=Gn(S∈Ob(CSBn)) will contain the elements {} and {1},...,{n}. We will often write this as a ..., at the start of our set to save time.
For an example in the opposite direction, consider the set in CSB3 which consists of the DAGs {(1,2),(3,2)}, {(2,3),(1,3)} and the minimal set of other DAGs needed for this to be a valid element of CSB3. This is mapped by G3 to the set {...,{2,3}}, which is correspondingly mapped by F3 to the set of DAGs “downstream” of {(2,3)} and {(3,2)}. I left sets like this out of our category in the previous post.
As an example where our functors are inverse to one another, consider the initial object of FV3, which is the empty set {}. This is mapped by F3 to the initial object of CSB3, which is the set containing all elements of DAG(3). This is compatible with no shared factors (as any shared factors impose constraints on Bayes nets) so is mapped by G3 right back to the empty set. This is true for any n. It also applies to the terminal objects of any FVn and CSBn i.e. the total power set P(n) and the set containing only fully-connected Bayes nets.
Composing F and G
I am going to be lazy here and omit the subscript ns, just imagine I’ve put them in while I’m talking about general properties of all Fn and Gn. Consider the following sequence starting with an element V∈Ob(FV):
VF→SG→WF→R
We have that ∀v∈V,s∈S,Cmp(v,s). But W is defined as the set of all elements w (in the relevant set P({1,...,n}) for our given n) such that ∀s∈S,Cmp(w,s). This means that all of our elements of V are also elements of W, in other words V⊆W, and by the definition of our functor F, V⊆W⟹S⊇R.
But consider that we can reverse the logic: ∀s∈S,w∈W,Cmp(w,s), but R is defined by R={r∣∣∀w∈W,Cmp(w,r)}. This means that S⊆R, which alongside S⊇R gives us S=R.
This means that for all V∈Ob(FV), FGF(V)=F(V). We can skip writing V and write this as FGF=F, which is also the same as saying that FG=I (the identity morphism) on the image of FV under F. Conversely GF=I on the image of CSB under G.
This also means that (GF)n=GF∀n∈N+ so GF is a “projection” operator. It “projects” the elements of FV onto a subcategory of FV, but then doesn’t do anything further to elements that are already there. Likewise for FG.
On our earlier diagram this all looks like this:
F maps elements of FV to a elements of CSB. We can label all of the elements that get “hit” as the image of F, written Im(F). Likewise G maps elements of CSB to elements of FV, giving its image Im(G). More importantly, Im(G)≡Im(F), with F=G−1 on these subcategories.
Properties of the Resulting Subcategories
In category theory, we tend not to care about what the elements of a category are, only about the structure present in the morphisms. Therefore, rather than consider the rather clunky concept of two equivalent subcategories which are formed as the images of functors, we really just want to look at the resulting structure. Since I have no idea what to call it, I will call these categories Dn for now. Since it makes no sense to perform inference on zero variables, we’ll impose the limit n>0.
D1≡1, which is the trivial category, with one element and one morphism.D2 is a two element category with a morphism from one of its elements to the other.D3 is a fifteen-element category with the structure discussed in the previous post.
D4 has 1218 elements. This is the highest I’ve been able to enumerate using inefficient python code and a mediocre PC. I could perhaps try to get something to work on D5, but no amount of efficiency can fight O(22n) for long. OEIS has nothing for the sequence 1,2,15,1218 so I’m stumped for better representations.
Combining Nodes in Bayes Nets
One bookkeeping rule we have not used yet is the node-combining rule. This lets us map an object of DAGn to an object of DAGn−1. We do this by combining two numbers i≠j∈{1,...,n}, replacing every j in an edge with i, and then relabelling our numbers to be {1,...,n−1}. We can also do the same for FVn, mapping its elements to FVn−1. This also works by combining two numbers as above.
For a given set X of possible “worlds”, we might want to split it up into various partitions Xi (unlike in FFS, we do not impose any restrictions on these partitions). So if we have X∈{a,b,c,d} then the partition X1∈{a,b∨c∨d} is a totally valid variable, as is X2{a∨b,c∨d}. These “naturally” give a new variable (X1,X2)={a,b,c∨d} when combined.
So, given a copy of DAGn, we can “join” two variables together to map each element to an element of a copy of DAGn−1. We can do this in nC2 different ways. Up to now, we haven’t really cared about what each node of a Bayes net actually was, but if we want to combine nodes we’ll need to think about this.
For a given set, X and a set of set of partitions Xi, we can define DAG{Xi} as the category of Bayes nets over those partitions. We can then “glue together” all categories like this, to create DAGX for a set X. We can likewise associate factor overlap diagrams with these, and create FV{Xi} for each set of disjoint partitions. By gluing together the relevant functors F and G for each pair of categories, we can make an enormous category DX corresponding to all sets of Bayes nets that a probability distribution over DX could obey.
Three Further Thoughts
Thought 1
What are the properties of the sets in GF(FV), i.e. the ones post-projection? Given a set of m factor overlaps (which includes {},{1},...,{n}), can we determine in some reasonable time (polynomial in m and n perhaps?) whether it is changed by GF? We have no good closed-form representation of the elements of D yet.
Thought 2
Bayes nets let us combine nodes (i.e. partitions) to get a new node and shrink the Bayes net, but is there some way to think about “blending” nodes continuously to move between Bayes nets with the same number of nodes?
So if we have our world X∈{a,b,c,d}, then we can split this into two nodes in a two-element Bayes net: X1∈{a∨b,c∨d},X2∈{a∨c,b∨d}. Or we could split it into two nodes like this: X1∈{a∨b,c∨d},X3∈{a∨d,b∨c}. Is there a way to embed these into a continuous space? Can we rotate between them? Can we describe probability distributions as elements of some space like P(Cn) or something, then Bayes nets as regions in this space?
Thought 3
DX is unfathomably large even for small sets: we can have up to Bell(n) different partitions (and Bell(n) grows exponentially fast), so we may have up to 2Bell(n) different Bayes nets, with the largest being
Finite Factored Sets to Bayes Nets Part 2
This post assumes knowledge of category theory, finite factored sets, and Bayes nets.
The Setup
I’ve already talked about DAGs and factor overlap Venn diagrams in a previous post, where I studied them within a category-theoretic framework. Here I’ll also perform an explicit construction of them using set theory.
DAGs
I have already discussed the set of directed acyclic graphs over n elements. We will denote the set of all DAGs of n elements as DAG(n). Each Bayes net can be thought of as a set of pairs of elements {(i∈{1,...,n},j≠i∈{1,...,n}),...}.
This set can be converted into the category of Bayes nets over n elements by the addition of morphisms corresponding to “bookkeeping”-type relationships, which we will denoteDAGn. From this category, we can form a category whose elements are sets of the elements of DAGn, subject to the following condition on a set S:
(I’ll slightly go against standard notation by using calligraphic acronyms for my category names i.e. DAGn. I don’t feel like any of my categories are definitely natural or useful enough to earn a “proper” bold name like DAGn)
∀s∈S,b∈DAG(n), s→b⟹b∈S
This means that, if we can reach a given Bayes net b from any element s of our set S, that Bayes net b must also be in S. I refer to these as compatible sets and denote the category CSBn (standing for compatible sets of Bayes nets) as the category whose elements are compatible sets of n elements and for any two elements. To write it out fully:
Ob(CSBn)={S∈P(DAG(n))∣∣∀s∈S,d∈DAG(n),DAGn(s,d)={→}⟹d∈S}
CSBn(A,B)={{→} A⊇B {} A⊂B
This should be read as “There is a unique morphism from A to B if and only if A is a weak superset of B, otherwise there is no morphism”. Orderings of this form always follow the rules required to create a category.
(Aside: sometimes we think about equivalence classes of Bayes nets. If we choose, we can first convert our Bayes nets to equivalence classes, then convert them to compatible sets, but this is not needed here)
This category has an initial object Disn which is a set consisting of the discrete Bayes net (with no arrows at all) and therefore all Bayes nets. Less trivially it has a terminal object Indn which contains exactly the Bayes nets which have an arrow between every pair of objects.
Venn diagrams
Presence/absence Venn diagrams of n elements can be thought of as elements of the power-set of the power-set of the set {1,...,n}. We can write this as P(P({1,...,n})), but I will abbreviate this by writing PP(n) in future. As before, we can form a category by ordering the sets, but in this case we will reverse the order of the morphisms, to create the category FVn (standing for factor Venn diagrams).
Ob(FVn)=PP(n)
FVn(A,B)={{→} A⊆B {} A⊃B
This category has an initial object {} and a terminal object P(1,...,n).
Our categories look (something) like this:
The Payoff
Functors Fn and Gn
There exist functors FVnFn⇄GnCSBn which are naturally-defined with respect to the properties of joint probability distributions. There exist subcategories FV′n and CSB′n on which the functors Fn and Gn are totally inverse. The resulting category structure—I hope—will be useful for causal inference.
We will define compatibility Cmp(a,b) between a set of elements a⊆{1,...,n} and a Bayes net b∈DAG(n) as follows: there must exist some node ai∈(a↪b) (the hook arrow ↪ denotes the “inclusion” of the elements of a into b, so it refers to the nodes in bwhich are elements of a) such that all other nodes a¯i∈(a↪b) are reachable via paths including only elements of a↪b.
Fn:FVn→CSBn maps an object V∈Ob(FVn) is mapped to an object S∈Ob(CSBn)according to the following rule: a Bayes net s∈DAG(n) is present in S if and only if Cmp(v,s) holds for every set v∈V.
We will quickly verify that this respects morphisms. Any W such that V→W must contain at least all of the sets present in V (since V→W⟺V⊆W) and possibly more. Therefore, any r∈R=Fn(W) must also follow all of the constraints imposed by Fn(V) and possibly more, so S⊇R⟺S→R.
Next, we wish to define Gn. This requires a little finesse but isn’t too difficult. We shall say that Gn:CSBn→FVn maps an object S∈Ob(CSBn) to an object V∈Ob(FVn) according to the following rule: a set v is present in V if and only if Cmp(v,s) holds for every Bayes net net s∈S.
You may notice that this is almost the reverse of Fn. This is by design. It’s worth pausing for a moment to consider how they differ: Fn maps a factor overlap Venn diagram, constructed as a set of sets of items from {1,...,n}, to a set of DAGs. We can think of it as starting with the complete set of DAGs, then going through our venn diagram V, and for each v∈V throwing out the DAGs which don’t conform.
Conversely Gn maps a set of DAGs to a set of factor overlaps. We can think of it as starting with the completely full factor overlap Venn diagram, and for each DAG s∈S, it throws out the factors which don’t apply.
Quick Examples
Our functors are not quite inverses: for an example consider the Venn diagram corresponding to the set {{},{1},{2},{1,2},{1,3},{2,3}} in FV3. This is mapped by F3 to exactly the set of totally connected DAGs, which is the terminal object in CSB3. The terminal object in CSB3 is mapped by G3 to the terminal object {{},{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}} in FV3. This corresponds to what I called “agnosticism” in our set of Venn diagrams in the last post.
Note that all objects V∈Ob(FVn) such that V=Gn(S∈Ob(CSBn)) will contain the elements {} and {1},...,{n}. We will often write this as a ..., at the start of our set to save time.
For an example in the opposite direction, consider the set in CSB3 which consists of the DAGs {(1,2),(3,2)}, {(2,3),(1,3)} and the minimal set of other DAGs needed for this to be a valid element of CSB3. This is mapped by G3 to the set {...,{2,3}}, which is correspondingly mapped by F3 to the set of DAGs “downstream” of {(2,3)} and {(3,2)}. I left sets like this out of our category in the previous post.
As an example where our functors are inverse to one another, consider the initial object of FV3, which is the empty set {}. This is mapped by F3 to the initial object of CSB3, which is the set containing all elements of DAG(3). This is compatible with no shared factors (as any shared factors impose constraints on Bayes nets) so is mapped by G3 right back to the empty set. This is true for any n. It also applies to the terminal objects of any FVn and CSBn i.e. the total power set P(n) and the set containing only fully-connected Bayes nets.
Composing F and G
I am going to be lazy here and omit the subscript ns, just imagine I’ve put them in while I’m talking about general properties of all Fn and Gn. Consider the following sequence starting with an element V∈Ob(FV):
VF→SG→WF→R
We have that ∀v∈V,s∈S,Cmp(v,s). But W is defined as the set of all elements w (in the relevant set P({1,...,n}) for our given n) such that ∀s∈S,Cmp(w,s). This means that all of our elements of V are also elements of W, in other words V⊆W, and by the definition of our functor F, V⊆W⟹S⊇R.
But consider that we can reverse the logic: ∀s∈S,w∈W,Cmp(w,s), but R is defined by R={r∣∣∀w∈W,Cmp(w,r)}. This means that S⊆R, which alongside S⊇R gives us S=R.
This means that for all V∈Ob(FV), FGF(V)=F(V). We can skip writing V and write this as FGF=F, which is also the same as saying that FG=I (the identity morphism) on the image of FV under F. Conversely GF=I on the image of CSB under G.
This also means that (GF)n=GF ∀n∈N+ so GF is a “projection” operator. It “projects” the elements of FV onto a subcategory of FV, but then doesn’t do anything further to elements that are already there. Likewise for FG.
On our earlier diagram this all looks like this:
F maps elements of FV to a elements of CSB. We can label all of the elements that get “hit” as the image of F, written Im(F). Likewise G maps elements of CSB to elements of FV, giving its image Im(G). More importantly, Im(G)≡Im(F), with F=G−1 on these subcategories.
Properties of the Resulting Subcategories
In category theory, we tend not to care about what the elements of a category are, only about the structure present in the morphisms. Therefore, rather than consider the rather clunky concept of two equivalent subcategories which are formed as the images of functors, we really just want to look at the resulting structure. Since I have no idea what to call it, I will call these categories Dn for now. Since it makes no sense to perform inference on zero variables, we’ll impose the limit n>0.
D1≡1, which is the trivial category, with one element and one morphism.D2 is a two element category with a morphism from one of its elements to the other.D3 is a fifteen-element category with the structure discussed in the previous post.
D4 has 1218 elements. This is the highest I’ve been able to enumerate using inefficient python code and a mediocre PC. I could perhaps try to get something to work on D5, but no amount of efficiency can fight O(22n) for long. OEIS has nothing for the sequence 1,2,15,1218 so I’m stumped for better representations.
Combining Nodes in Bayes Nets
One bookkeeping rule we have not used yet is the node-combining rule. This lets us map an object of DAGn to an object of DAGn−1. We do this by combining two numbers i≠j∈{1,...,n}, replacing every j in an edge with i, and then relabelling our numbers to be {1,...,n−1}. We can also do the same for FVn, mapping its elements to FVn−1. This also works by combining two numbers as above.
For a given set X of possible “worlds”, we might want to split it up into various partitions Xi (unlike in FFS, we do not impose any restrictions on these partitions). So if we have X∈{a,b,c,d} then the partition X1∈{a,b∨c∨d} is a totally valid variable, as is X2{a∨b,c∨d}. These “naturally” give a new variable (X1,X2)={a,b,c∨d} when combined.
So, given a copy of DAGn, we can “join” two variables together to map each element to an element of a copy of DAGn−1. We can do this in nC2 different ways. Up to now, we haven’t really cared about what each node of a Bayes net actually was, but if we want to combine nodes we’ll need to think about this.
For a given set, X and a set of set of partitions Xi, we can define DAG{Xi} as the category of Bayes nets over those partitions. We can then “glue together” all categories like this, to create DAGX for a set X. We can likewise associate factor overlap diagrams with these, and create FV{Xi} for each set of disjoint partitions. By gluing together the relevant functors F and G for each pair of categories, we can make an enormous category DX corresponding to all sets of Bayes nets that a probability distribution over DX could obey.
Three Further Thoughts
Thought 1
What are the properties of the sets in GF(FV), i.e. the ones post-projection? Given a set of m factor overlaps (which includes {},{1},...,{n}), can we determine in some reasonable time (polynomial in m and n perhaps?) whether it is changed by GF? We have no good closed-form representation of the elements of D yet.
Thought 2
Bayes nets let us combine nodes (i.e. partitions) to get a new node and shrink the Bayes net, but is there some way to think about “blending” nodes continuously to move between Bayes nets with the same number of nodes?
So if we have our world X∈{a,b,c,d}, then we can split this into two nodes in a two-element Bayes net: X1∈{a∨b,c∨d},X2∈{a∨c,b∨d}. Or we could split it into two nodes like this: X1∈{a∨b,c∨d},X3∈{a∨d,b∨c}. Is there a way to embed these into a continuous space? Can we rotate between them? Can we describe probability distributions as elements of some space like P(Cn) or something, then Bayes nets as regions in this space?
Thought 3
DX is unfathomably large even for small sets: we can have up to Bell(n) different partitions (and Bell(n) grows exponentially fast), so we may have up to 2Bell(n) different Bayes nets, with the largest being