Finite Factored Sets to Bayes Nets Part 2

This post assumes knowledge of category theory, finite factored sets, and Bayes nets.

The Setup

I’ve already talked about DAGs and factor overlap Venn diagrams in a previous post, where I studied them within a category-theoretic framework. Here I’ll also perform an explicit construction of them using set theory.

DAGs

I have already discussed the set of directed acyclic graphs over $n$ elements. We will denote the set of all DAGs of $n$ elements as $D A G (n)$ . Each Bayes net can be thought of as a set of pairs of elements ${(i \in {1, . . ., n}, j \neq i \in {1, . . ., n}), . . .}$ .

This set can be converted into the category of Bayes nets over $n$ elements by the addition of morphisms corresponding to “bookkeeping”-type relationships, which we will denote ${D A G}_{n}$ . From this category, we can form a category whose elements are sets of the elements of ${D A G}_{n}$ , subject to the following condition on a set $S$ :

(I’ll slightly go against standard notation by using calligraphic acronyms for my category names i.e. ${D A G}_{n}$ . I don’t feel like any of my categories are definitely natural or useful enough to earn a “proper” bold name like ${D A G}_{n}$ )

$\forall s \in S, b \in D A G (n), s \to b ⟹ b \in S$

This means that, if we can reach a given Bayes net $b$ from any element $s$ of our set $S$ , that Bayes net $b$ must also be in $S$ . I refer to these as compatible sets and denote the category ${C S B}_{n}$ (standing for compatible sets of Bayes nets) as the category whose elements are compatible sets of $n$ elements and for any two elements. To write it out fully:

$O b ({C S B}_{n}) = {S \in P (D A G (n)) ∣ ∣ \forall s \in S, d \in D A G (n), {D A G}_{n} (s, d) = {\to} ⟹ d \in S}$

${C S B}_{n} (A, B) = {\begin{matrix} {\to} & A \supseteq B {} & A \subset B \end{matrix}$

This should be read as “There is a unique morphism from $A$ to $B$ if and only if $A$ is a weak superset of $B$ , otherwise there is no morphism”. Orderings of this form always follow the rules required to create a category.

(Aside: sometimes we think about equivalence classes of Bayes nets. If we choose, we can first convert our Bayes nets to equivalence classes, then convert them to compatible sets, but this is not needed here)

This category has an initial object ${D i s}_{n}$ which is a set consisting of the discrete Bayes net (with no arrows at all) and therefore all Bayes nets. Less trivially it has a terminal object ${I n d}_{n}$ which contains exactly the Bayes nets which have an arrow between every pair of objects.

Venn diagrams

Presence/absence Venn diagrams of $n$ elements can be thought of as elements of the power-set of the power-set of the set ${1, . . ., n}$ . We can write this as $P (P ({1, . . ., n}))$ , but I will abbreviate this by writing $P P (n)$ in future. As before, we can form a category by ordering the sets, but in this case we will reverse the order of the morphisms, to create the category ${F V}_{n}$ (standing for factor Venn diagrams).

$O b ({F V}_{n}) = P P (n)$

${F V}_{n} (A, B) = {\begin{matrix} {\to} & A \subseteq B {} & A \supset B \end{matrix}$

This category has an initial object ${}$ and a terminal object $P (1, . . ., n)$ .

Our categories look (something) like this:

The Payoff

Functors $F_{n}$ and $G_{n}$

There exist functors ${F V}_{n}_{n} {C S B}_{n}$ which are naturally-defined with respect to the properties of joint probability distributions. There exist subcategories ${F V}_{n}^{'}$ and ${C S B}_{n}^{'}$ on which the functors $F_{n}$ and $G_{n}$ are totally inverse. The resulting category structure—I hope—will be useful for causal inference.

We will define compatibility $C m p (a, b)$ between a set of elements $a \subseteq {1, . . ., n}$ and a Bayes net $b \in D A G (n)$ as follows: there must exist some node $a_{i} \in (a ↪ b)$ (the hook arrow $↪$ denotes the “inclusion” of the elements of $a$ into $b$ , so it refers to the nodes in $b$ which are elements of $a$ ) such that all other nodes $a_{¯ i} \in (a ↪ b)$ are reachable via paths including only elements of $a ↪ b$ .

$F_{n} : {F V}_{n} \to {C S B}_{n}$ maps an object $V \in O b ({F V}_{n})$ is mapped to an object $S \in O b ({C S B}_{n})$ according to the following rule: a Bayes net $s \in D A G (n)$ is present in $S$ if and only if $C m p (v, s)$ holds for every set $v \in V$ .

We will quickly verify that this respects morphisms. Any $W$ such that $V \to W$ must contain at least all of the sets present in $V$ (since $V \to W ⟺ V \subseteq W$ ) and possibly more. Therefore, any $r \in R = F_{n} (W)$ must also follow all of the constraints imposed by $F_{n} (V)$ and possibly more, so $S \supseteq R ⟺ S \to R$ .

Next, we wish to define $G_{n}$ . This requires a little finesse but isn’t too difficult. We shall say that $G_{n} : {C S B}_{n} \to {F V}_{n}$ maps an object $S \in O b ({C S B}_{n})$ to an object $V \in O b ({F V}_{n})$ according to the following rule: a set $v$ is present in $V$ if and only if $C m p (v, s)$ holds for every Bayes net net $s \in S$ .

You may notice that this is almost the reverse of $F_{n}$ . This is by design. It’s worth pausing for a moment to consider how they differ: $F_{n}$ maps a factor overlap Venn diagram, constructed as a set of sets of items from ${1, . . ., n}$ , to a set of DAGs. We can think of it as starting with the complete set of DAGs, then going through our venn diagram $V$ , and for each $v \in V$ throwing out the DAGs which don’t conform.

Conversely $G_{n}$ maps a set of DAGs to a set of factor overlaps. We can think of it as starting with the completely full factor overlap Venn diagram, and for each DAG $s \in S$ , it throws out the factors which don’t apply.

Quick Examples

Our functors are not quite inverses: for an example consider the Venn diagram corresponding to the set ${{}, {1}, {2}, {1, 2}, {1, 3}, {2, 3}}$ in ${F V}_{3}$ . This is mapped by $F_{3}$ to exactly the set of totally connected DAGs, which is the terminal object in ${C S B}_{3}$ . The terminal object in ${C S B}_{3}$ is mapped by $G_{3}$ to the terminal object ${{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}$ in ${F V}_{3}$ . This corresponds to what I called “agnosticism” in our set of Venn diagrams in the last post.

Note that all objects $V \in O b ({F V}_{n})$ such that $V = G_{n} (S \in O b ({C S B}_{n}))$ will contain the elements ${}$ and ${1}, . . ., {n}$ . We will often write this as a $. . .,$ at the start of our set to save time.

For an example in the opposite direction, consider the set in ${C S B}_{3}$ which consists of the DAGs ${(1, 2), (3, 2)}$ , ${(2, 3), (1, 3)}$ and the minimal set of other DAGs needed for this to be a valid element of ${C S B}_{3}$ . This is mapped by $G_{3}$ to the set ${. . ., {2, 3}}$ , which is correspondingly mapped by $F_{3}$ to the set of DAGs “downstream” of ${(2, 3)}$ and ${(3, 2)}$ . I left sets like this out of our category in the previous post.

As an example where our functors are inverse to one another, consider the initial object of ${F V}_{3}$ , which is the empty set ${}$ . This is mapped by $F_{3}$ to the initial object of ${C S B}_{3}$ , which is the set containing all elements of $D A G (3)$ . This is compatible with no shared factors (as any shared factors impose constraints on Bayes nets) so is mapped by $G_{3}$ right back to the empty set. This is true for any $n$ . It also applies to the terminal objects of any ${F V}_{n}$ and ${C S B}_{n}$ i.e. the total power set $P (n)$ and the set containing only fully-connected Bayes nets.

Composing $F$ and $G$

I am going to be lazy here and omit the subscript $n$ s, just imagine I’ve put them in while I’m talking about general properties of all $F_{n}$ and $G_{n}$ . Consider the following sequence starting with an element $V \in O b (F V)$ :

$V F \to S G \to W F \to R$

We have that $\forall v \in V, s \in S, C m p (v, s)$ . But $W$ is defined as the set of all elements $w$ (in the relevant set $P ({1, . . ., n})$ for our given $n$ ) such that $\forall s \in S, C m p (w, s)$ . This means that all of our elements of $V$ are also elements of $W$ , in other words $V \subseteq W$ , and by the definition of our functor $F$ , $V \subseteq W ⟹ S \supseteq R$ .

But consider that we can reverse the logic: $\forall s \in S, w \in W, C m p (w, s)$ , but $R$ is defined by $R = {r ∣ ∣ \forall w \in W, C m p (w, r)}$ . This means that $S \subseteq R$ , which alongside $S \supseteq R$ gives us $S = R$ .

This means that for all $V \in O b (F V)$ , $F G F (V) = F (V)$ . We can skip writing $V$ and write this as $F G F = F$ , which is also the same as saying that $F G = I$ (the identity morphism) on the image of $F V$ under $F$ . Conversely $G F = I$ on the image of $C S B$ under $G$ .

This also means that $(G F)^{n} = G F \forall n \in N_{+}$ so $G F$ is a “projection” operator. It “projects” the elements of $F V$ onto a subcategory of $F V$ , but then doesn’t do anything further to elements that are already there. Likewise for $F G$ .

On our earlier diagram this all looks like this:

$F$ maps elements of $F V$ to a elements of $C S B$ . We can label all of the elements that get “hit” as the image of $F$ , written $I m (F)$ . Likewise $G$ maps elements of $C S B$ to elements of $F V$ , giving its image $I m (G)$ . More importantly, $I m (G) \equiv I m (F)$ , with $F = G^{- 1}$ on these subcategories.

Properties of the Resulting Subcategories

In category theory, we tend not to care about what the elements of a category are, only about the structure present in the morphisms. Therefore, rather than consider the rather clunky concept of two equivalent subcategories which are formed as the images of functors, we really just want to look at the resulting structure. Since I have no idea what to call it, I will call these categories $D_{n}$ for now. Since it makes no sense to perform inference on zero variables, we’ll impose the limit $n > 0$ .

$D_{1} \equiv 1$ , which is the trivial category, with one element and one morphism. $D_{2}$ is a two element category with a morphism from one of its elements to the other. $D_{3}$ is a fifteen-element category with the structure discussed in the previous post.

$D_{4}$ has 1218 elements. This is the highest I’ve been able to enumerate using inefficient python code and a mediocre PC. I could perhaps try to get something to work on $D_{5}$ , but no amount of efficiency can fight $O (2^{2^{n}})$ for long. OEIS has nothing for the sequence 1,2,15,1218 so I’m stumped for better representations.

Combining Nodes in Bayes Nets

One bookkeeping rule we have not used yet is the node-combining rule. This lets us map an object of ${D A G}_{n}$ to an object of ${D A G}_{n - 1}$ . We do this by combining two numbers $i \neq j \in {1, . . ., n}$ , replacing every $j$ in an edge with $i$ , and then relabelling our numbers to be ${1, . . ., n - 1}$ . We can also do the same for ${F V}_{n}$ , mapping its elements to ${F V}_{n - 1}$ . This also works by combining two numbers as above.

For a given set $X$ of possible “worlds”, we might want to split it up into various partitions $X_{i}$ (unlike in FFS, we do not impose any restrictions on these partitions). So if we have $X \in {a, b, c, d}$ then the partition $X_{1} \in {a, b \lor c \lor d}$ is a totally valid variable, as is $X_{2} {a \lor b, c \lor d}$ . These “naturally” give a new variable $(X_{1}, X_{2}) = {a, b, c \lor d}$ when combined.

So, given a copy of ${D A G}_{n}$ , we can “join” two variables together to map each element to an element of a copy of ${D A G}_{n - 1}$ . We can do this in $^{n} C_{2}$ different ways. Up to now, we haven’t really cared about what each node of a Bayes net actually was, but if we want to combine nodes we’ll need to think about this.

For a given set, $X$ and a set of set of partitions $X_{i}$ , we can define ${D A G}_{{X_{i}}}$ as the category of Bayes nets over those partitions. We can then “glue together” all categories like this, to create ${D A G}_{X}$ for a set $X$ . We can likewise associate factor overlap diagrams with these, and create ${F V}_{{X_{i}}}$ for each set of disjoint partitions. By gluing together the relevant functors $F$ and $G$ for each pair of categories, we can make an enormous category $D_{X}$ corresponding to all sets of Bayes nets that a probability distribution over $D_{X}$ could obey.

Three Further Thoughts

Thought 1

What are the properties of the sets in $G F (F V)$ , i.e. the ones post-projection? Given a set of $m$ factor overlaps (which includes ${}, {1}, . . ., {n}$ ), can we determine in some reasonable time (polynomial in $m$ and $n$ perhaps?) whether it is changed by $G F$ ? We have no good closed-form representation of the elements of $D$ yet.

Thought 2

Bayes nets let us combine nodes (i.e. partitions) to get a new node and shrink the Bayes net, but is there some way to think about “blending” nodes continuously to move between Bayes nets with the same number of nodes?

So if we have our world $X \in {a, b, c, d}$ , then we can split this into two nodes in a two-element Bayes net: $X_{1} \in {a \lor b, c \lor d}, X_{2} \in {a \lor c, b \lor d}$ . Or we could split it into two nodes like this: $X_{1} \in {a \lor b, c \lor d}, X_{3} \in {a \lor d, b \lor c}$ . Is there a way to embed these into a continuous space? Can we rotate between them? Can we describe probability distributions as elements of some space like $P (C^{n})$ or something, then Bayes nets as regions in this space?

Thought 3

$D_{X}$ is unfathomably large even for small sets: we can have up to $B e l l (n)$ different partitions (and $B e l l (n)$ grows exponentially fast), so we may have up to $2^{B e l l (n)}$ different Bayes nets, with the largest being