Bayes Net inference algorithms maintain its efficiency by using dynamic programming over multiple layers.
Level 0: Naive Marginalization
No dynamic programming whatsoever. Just multiply all the conditional probability distribution (CPD) tables, and sum over the variables of non-interest.
Level 1: Variable Elimination
Cache the repeated computations within a query.
For example, given a chain-structured Bayes Net A⟶B⟶C⟶D, instead of doing P(D)=∑A∑B∑CP(A,B,C,D), we can do P(D)=∑CP(D|C)∑BP(C|B)∑AP(A)P(B|A). Check my post for more.
Suppose you have a fixed Bayes Net, and you want to compute the marginalization not only P(D), but also P(A). Clearly running two instances of Variable Elimination as above is going to contain some overlapping computation.
Clique-tree is a data structure where, given the initial factors (in this case the CPD tables), you “calibrate” a tree whose nodes correspond to a subset of the variables. Cost can be amortized by running many queries over the same Bayes Net.
Calibration can be done by just two passes across the tree, after which you have the joint marginals for all the nodes of the clique tree.
Incorporating evidence is equally simple. Just zero-out the entries of variables that you are conditioning on for some node, then “propagate” that information downwards via a single pass across the tree.
Level 3: Specialized query-set answering algorithms over a calibrated clique tree.
Cache the repeated computations across a certain query-class
e.g., computing P(X,Y) for every pair of variables can be done by using yet another layer of dynamic programming by maintaining a table of P(Ci,Cj) for each pair of clique-tree nodes ordered according to their distance in-between.
Bayes Net inference algorithms maintain its efficiency by using dynamic programming over multiple layers.
Level 0: Naive Marginalization
No dynamic programming whatsoever. Just multiply all the conditional probability distribution (CPD) tables, and sum over the variables of non-interest.
Level 1: Variable Elimination
Cache the repeated computations within a query.
For example, given a chain-structured Bayes Net A⟶B⟶C⟶D, instead of doing P(D)=∑A∑B∑CP(A,B,C,D), we can do P(D)=∑CP(D|C)∑BP(C|B)∑AP(A)P(B|A). Check my post for more.
Level 2: Clique-tree based algorithms — e.g., Sum-product (SP) / Belief-update (BU) calibration algorithms
Cache the repeated computations across queries.
Suppose you have a fixed Bayes Net, and you want to compute the marginalization not only P(D), but also P(A). Clearly running two instances of Variable Elimination as above is going to contain some overlapping computation.
Clique-tree is a data structure where, given the initial factors (in this case the CPD tables), you “calibrate” a tree whose nodes correspond to a subset of the variables. Cost can be amortized by running many queries over the same Bayes Net.
Calibration can be done by just two passes across the tree, after which you have the joint marginals for all the nodes of the clique tree.
Incorporating evidence is equally simple. Just zero-out the entries of variables that you are conditioning on for some node, then “propagate” that information downwards via a single pass across the tree.
Level 3: Specialized query-set answering algorithms over a calibrated clique tree.
Cache the repeated computations across a certain query-class
e.g., computing P(X,Y) for every pair of variables can be done by using yet another layer of dynamic programming by maintaining a table of P(Ci,Cj) for each pair of clique-tree nodes ordered according to their distance in-between.