IlyaShpitser comments on Taking “correlation does not imply causation” back from the internet

IlyaShpitser 3 Oct 2012 16:20 UTC
12 points
If you are familiar with d-separation (http://en.wikipedia.org/wiki/D-separation#d-separation), we have:

if A is dependent on B, and there’s some unobserved C involved, then:

(1) A ← C → B, or

(2) A → C → B, or

(3) A ← C ← B

(this is Reichenbach’s common cause principle: http://plato.stanford.edu/entries/physics-Rpcc/)

or

(4) A → C ← B

if C or its effect attains a particular (not necessarily recorded) value. Statisticians know this as Berkson’s bias, which is a form of selection bias. In AI, this is known as “explaining away.” Manfred’s excellent example falls into category (4), with C observed to equal “hired as actor.”

Beware: d-separation applies to causal graphical models, and Bayesian networks (which are statistical and not causal models). The meaning of arrows is different in these two kinds of models. This is actually a fairly subtle issue.
- shokwave 3 Oct 2012 19:58 UTC
  0 points
  Parent
  Odd—I always felt like d-separation was the same thing on causal diagrams and on Bayes networks. Although, I also understood Bayes network as being a model of the causal directions in a situation, so perhaps that’s why.
  
  Manfred’s excellent example needs equally excellent counterparts for other possibilities.
  - IlyaShpitser 3 Oct 2012 20:22 UTC
    3 points
    Parent
    Sorry for not being clear. The d-separation criterion is the same in both Bayesian networks and causal diagrams, but its meaning is not the same. This is because an arrow A → B in a causal diagram means (loosely) that A is a direct cause of B at the level of granularity of the model, while an arrow A → B in a Bayesian network has a more complicated to explain meaning having to do with the Markov factorization and conditional independence. D-separation talks about arrows in both cases, but asserts different things due to a difference in the meaning of those arrows.
    
    A Bayesian network model is just a statistical model (a set of joint distributions) associated with a directed acyclic graph. Specifically it’s all distributions p(x1, …, xk) that factorize as a product of terms of the form p(x_i | parents(x_i)). Nothing more, nothing less. Nothing about causality in that definition.
    
    I think examples for (1),(2),(3) are simpler than Manfred’s Berkson’s bias example.
    
    (1) A ← C → B
    
    Most clearly non-causal associations go here: “shoe size correlates with IQ” and its kin.
    
    (2) A → C → B, and (3) A ← C ← B
    
    Classic scientific triumphs go here: “smoking causes cancer.” Of note here is that if we can find an observable unconfounded C that intercepts all/most of the causal pathway, this is extremely valuable for estimating effects. If you can design an experiment with such a C, you don’t even have to randomize A.