Manfred comments on Taking “correlation does not imply causation” back from the internet

Manfred 3 Oct 2012 15:20 UTC
25 points

if variables A and B are correlated, then we can be pretty damn sure that either: a) A causes B b) B causes A c) there’s a third variable affecting both A and B.

There is in fact a d) A and not-B both can cause some condition C that defines our sample.

Example: Sexy people are more likely to be hired as actors. Good actors are also more likely to be hired as actors. So if we look at “people who are actors,” then we’ll get people who are sexy but can’t really act, people who are sexy and can act, and people who can act and aren’t really sexy. If sexiness and acting ability are independent, these three groups will be about equally full.

Thus if we look at actors in general in our simple model, ²⁄₃ of them will be sexy and ²⁄₃ of them will be good actors. But of the ones who are sexy, only ¹⁄₂ will be good actors. So being sexy is correlated with being a bad actor! Not because sexiness rots your brain (a), or because acting well makes you ugly (b), and not because acting classes cause both good acting and ugliness, or diet pills cause both beauty and bad acting (c). Instead, it’s just because how we picked actors made sexiness and acting ability “compete for the same niche.”

Similar examples would be sports and academics in college, different sorts of skills in people promoted in the workplace, UI design versus functionality in popular programs, and so on and so on.
- shokwave 3 Oct 2012 16:12 UTC
  5 points
  Parent
  I feel like this example should go on the doesnotimply website.
  - IlyaShpitser 3 Oct 2012 16:20 UTC
    12 points
    Parent
    If you are familiar with d-separation (http://en.wikipedia.org/wiki/D-separation#d-separation), we have:
    
    if A is dependent on B, and there’s some unobserved C involved, then:
    
    (1) A ← C → B, or
    
    (2) A → C → B, or
    
    (3) A ← C ← B
    
    (this is Reichenbach’s common cause principle: http://plato.stanford.edu/entries/physics-Rpcc/)
    
    or
    
    (4) A → C ← B
    
    if C or its effect attains a particular (not necessarily recorded) value. Statisticians know this as Berkson’s bias, which is a form of selection bias. In AI, this is known as “explaining away.” Manfred’s excellent example falls into category (4), with C observed to equal “hired as actor.”
    
    Beware: d-separation applies to causal graphical models, and Bayesian networks (which are statistical and not causal models). The meaning of arrows is different in these two kinds of models. This is actually a fairly subtle issue.
    - shokwave 3 Oct 2012 19:58 UTC
      0 points
      Parent
      Odd—I always felt like d-separation was the same thing on causal diagrams and on Bayes networks. Although, I also understood Bayes network as being a model of the causal directions in a situation, so perhaps that’s why.
      
      Manfred’s excellent example needs equally excellent counterparts for other possibilities.
      - IlyaShpitser 3 Oct 2012 20:22 UTC
        3 points
        Parent
        Sorry for not being clear. The d-separation criterion is the same in both Bayesian networks and causal diagrams, but its meaning is not the same. This is because an arrow A → B in a causal diagram means (loosely) that A is a direct cause of B at the level of granularity of the model, while an arrow A → B in a Bayesian network has a more complicated to explain meaning having to do with the Markov factorization and conditional independence. D-separation talks about arrows in both cases, but asserts different things due to a difference in the meaning of those arrows.
        
        A Bayesian network model is just a statistical model (a set of joint distributions) associated with a directed acyclic graph. Specifically it’s all distributions p(x1, …, xk) that factorize as a product of terms of the form p(x_i | parents(x_i)). Nothing more, nothing less. Nothing about causality in that definition.
        
        I think examples for (1),(2),(3) are simpler than Manfred’s Berkson’s bias example.
        
        (1) A ← C → B
        
        Most clearly non-causal associations go here: “shoe size correlates with IQ” and its kin.
        
        (2) A → C → B, and (3) A ← C ← B
        
        Classic scientific triumphs go here: “smoking causes cancer.” Of note here is that if we can find an observable unconfounded C that intercepts all/most of the causal pathway, this is extremely valuable for estimating effects. If you can design an experiment with such a C, you don’t even have to randomize A.
- A1987dM 3 Oct 2012 17:47 UTC
  4 points
  Parent
  That’s known as Berkson’s paradox.
- Antisuji 3 Oct 2012 18:43 UTC
  2 points
  Parent
  I first heard of this idea a few months ago in a blog post at The Atlantic.
  - Manfred 3 Oct 2012 20:52 UTC
    0 points
    Parent
    Aha, yes—which I think I in turn was linked to by Ben Goldacre. But the reason I was quickly able to enumerate this as a separate kind of correlation is because the causal graph is different, which would be Judea Pearl.
    - Antisuji 4 Oct 2012 0:23 UTC
      0 points
      Parent
      Yup. I’m reading the link from this post and just got to the discussion of Berkson’s paradox, which seems to be the same effect.
- prase 3 Oct 2012 20:42 UTC
  −1 points
  Parent
  
  If sexiness and acting ability are independent, these three groups will be about equally full.
  
  What do you mean by “equally full”?
  - Manfred 3 Oct 2012 20:45 UTC
    2 points
    Parent
    I mean “I’m about to pretend that ‘sexy’ and ‘good actor’ are binary variables centered to make the math super easy.” If you would like less pretending, read the Atlantic article linked by a thoughtful replier, since the author draws the nice graph to prove the general case.
    - prase 3 Oct 2012 21:13 UTC
      −1 points
      Parent
      I wouldn’t like less pretending and ‘sexy’/‘good actor’ being binary variables is fine with me (and I understand your comment overall), but still I don’t know what does it mean that the groups are equally full. (Equal size? That doesn’t follow from independence.)
      - Manfred 3 Oct 2012 21:29 UTC
        1 point
        Parent
        Right, so I make the math-light but false assumption that casting directors will take above-average applicants, and also that you aren’t more likely to eventually become an actor if you’re sexy and can act well.
        prase 3 Oct 2012 22:15 UTC
        −3 points
        Parent
        
        above-average
        
        If you mean “above median”, I see.