I qualify with “most” because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect).
You probably did not intend to imply that this was an arbitrary choice, but it would still be interesting to hear your thoughts on it. It seems to me that the choice to represent independences by missing arrows was necessary. If they had instead chosen to represent dependences by present arrows, I don’t see how the graphs would be useful for causal inference.
If missing arrows represent independences and the backdoor criterion holds, this is interpreted as “for all distributions that are consistent with the model, there is no confounding”. This is clearly very useful. If arrows represented dependences, it would instead be interpreted as “For at least one distribution that is consistent with the DAG model, there is no confounding”. This is not useful to the investigator.
Since unconfoundedness is an independence-relation, it is not clear to me how graphs that encode dependence-relations would be useful. Can you think of a graphical criterion for unconfoundedness in dependence graphs? Or would dependence graphs be useful for a different purpose?
Hi, thanks for this. I agree that this choice was not arbitrary at all!
There are a few related reasons why it was made.
(a) Pearl wisely noted that it is independences that we exploit for things like propagating beliefs around a sparse graph in polynomial time. When he was still arguing for the use of probability in AI, people in AI were still not fully on board, because they thought that to probabilistically reason about n binary variables we need a 2^n table for the joint, which is a non-starter (of course statisticians were on board w/ probability for hundreds of years even though they didn’t have computers—their solution was to use clever parametric models. In some sense Bayesian networks are just another kind of clever parametric model that finally penetrated the AI culture in the late 80s).
(b) We can define statistical (causal) models by either independences or dependences, but there is a lack of symmetry here that the symmetry of the “presence or absence of edges in a graph” masks. An independence is about a small part of the parameter space. That is, a model defined by an independence will correspond to a manifold of smaller dimension generally that sits in a space corresponding to a saturated model (no constraints). A model defined by dependences will just be that same space with a “small part” missing. Lowering dimension in a model is really nice in stats for a number of reasons.
(c) While conceivably we might be interested in a presence of a causal effect more than an absence of a causal effect, you are absolutely right that generally assumptions that allow us to equate a causal effect with some functional of observed data take the form of equality constraints (e.g. “independences in something.”) So it is much more useful to represent that even if we care about the presence of an effect at the end of the day. We can just see how far from null the final effect number is—we don’t need a graphical representation. However a graphical representation for assumptions we are exploiting to get the effect as a functional of observed data is very handy—this is what eventually led Jin Tian to his awesome identification algorithm on graphs.
(d) There is an interesting logical structure to conditional independence, e.g. Phil Dawid’s graphoid axioms. There is something like that for dependences (Armstrong’s axioms for functional dependence in db theory?) but the structure isn’t as rich.
edit: there are actually only two semi-graphoids : one for symmetry and one for chain rule.
edit^2: graphoids are not complete (because conditional independence is actually kind of a nasty relation). But at least it’s a ternary relation. There are far worse dragons in the cave of “equality constraints.”
Good comment—upvoted. Just a minor question:
You probably did not intend to imply that this was an arbitrary choice, but it would still be interesting to hear your thoughts on it. It seems to me that the choice to represent independences by missing arrows was necessary. If they had instead chosen to represent dependences by present arrows, I don’t see how the graphs would be useful for causal inference.
If missing arrows represent independences and the backdoor criterion holds, this is interpreted as “for all distributions that are consistent with the model, there is no confounding”. This is clearly very useful. If arrows represented dependences, it would instead be interpreted as “For at least one distribution that is consistent with the DAG model, there is no confounding”. This is not useful to the investigator.
Since unconfoundedness is an independence-relation, it is not clear to me how graphs that encode dependence-relations would be useful. Can you think of a graphical criterion for unconfoundedness in dependence graphs? Or would dependence graphs be useful for a different purpose?
Hi, thanks for this. I agree that this choice was not arbitrary at all!
There are a few related reasons why it was made.
(a) Pearl wisely noted that it is independences that we exploit for things like propagating beliefs around a sparse graph in polynomial time. When he was still arguing for the use of probability in AI, people in AI were still not fully on board, because they thought that to probabilistically reason about n binary variables we need a 2^n table for the joint, which is a non-starter (of course statisticians were on board w/ probability for hundreds of years even though they didn’t have computers—their solution was to use clever parametric models. In some sense Bayesian networks are just another kind of clever parametric model that finally penetrated the AI culture in the late 80s).
(b) We can define statistical (causal) models by either independences or dependences, but there is a lack of symmetry here that the symmetry of the “presence or absence of edges in a graph” masks. An independence is about a small part of the parameter space. That is, a model defined by an independence will correspond to a manifold of smaller dimension generally that sits in a space corresponding to a saturated model (no constraints). A model defined by dependences will just be that same space with a “small part” missing. Lowering dimension in a model is really nice in stats for a number of reasons.
(c) While conceivably we might be interested in a presence of a causal effect more than an absence of a causal effect, you are absolutely right that generally assumptions that allow us to equate a causal effect with some functional of observed data take the form of equality constraints (e.g. “independences in something.”) So it is much more useful to represent that even if we care about the presence of an effect at the end of the day. We can just see how far from null the final effect number is—we don’t need a graphical representation. However a graphical representation for assumptions we are exploiting to get the effect as a functional of observed data is very handy—this is what eventually led Jin Tian to his awesome identification algorithm on graphs.
(d) There is an interesting logical structure to conditional independence, e.g. Phil Dawid’s graphoid axioms. There is something like that for dependences (Armstrong’s axioms for functional dependence in db theory?) but the structure isn’t as rich.
edit: there are actually only two semi-graphoids : one for symmetry and one for chain rule.
edit^2: graphoids are not complete (because conditional independence is actually kind of a nasty relation). But at least it’s a ternary relation. There are far worse dragons in the cave of “equality constraints.”