Taking “correlation does not imply causation” back from the internet
(An idea I had while responding to this quotes thread)
“Correlation does not imply causation” is bandied around inexpertly and inappropriately all over the internet. Lots of us hate this.
But get this: the phrase, and the most obvious follow-up phrases like “what does imply causation?” are not high-competition search terms. Up until about an hour ago, the domain name correlationdoesnotimplycausation.com was not taken. I have just bought it.
There is a correlation-does-not-imply-causation shaped space on the internet, and it’s ours for the taking. I would like to fill this space with a small collection of relevant educational resources explaining what is meant by the term, why it’s important, why it’s often used inappropriately, and the circumstances under which one may legitimately infer causation.
At the moment the Wikipedia page is trying to do this, but it’s not really optimised for the task. It also doesn’t carry the undercurrent of “no, seriously, lots of smart people get this wrong; let’s make sure you’re not one of them”, and I think it should.
The purpose of this post is two-fold:
Firstly, it lets me say “hey dudes, I’ve just had this idea. Does anyone have any suggestions (pragmatic/technical, content-related, pointing out why it’s a terrible idea, etc.), or alternatively, would anyone like to help?”
Secondly, it raises the question of what other corners of the internet are ripe for the planting of sanity waterline-raising resources. Are there any other similar concepts that people commonly get wrong, but don’t have much of a guiding explanatory web presence to them? Could we put together a simple web platform for carrying out this task in lots of different places? The LW readership seems ideally placed to collectively do this sort of work.
- 3 Oct 2012 12:19 UTC; 0 points) 's comment on Rationality Quotes October 2012 by (
A discussion I had in the reddit comments on that Slate post made me invent this fake argument:
A: People who drink water inevitably end up dead. Therefore drinking water causes death.
B: No, that is correlation, not causation.
C: No, it is not correlation. To calculate correlation you divide the covariance of the two variables by the variance of each of the variables. In this case there is no variance in either variable, so you’re dividing by zero, so correlation is not even defined.
I think it’s an improvement to go from saying “there is obviously something wrong with A’s argument” to actually being able to point out the divide-by-zero in the equation.
If you don’t drink water, you still die—that sounds pretty uncorrelated to me.
Only to score points at the expense of the audience’s vocabulary would one say “there is no variance in either variable” as opposed to saying “there are no people who avoid drinking water, nor people who don’t end up dead, to compare to”.
Let’s not encourage this.
This is perfectly good common-sense reasoning but doesn’t make the point that the correlation is undefined. I would’ve thought the audience for this, ie people involved in a correlation versus causation debate, would benefit from seeing that explicitly, if they don’t already. Maybe we are judging the audience differently. If we assume that everyone knows dividing by zero is bad, but don’t use any other technical term (including variance), maybe we can get the point across.
This is perfectly good common-sense reasoning that explains why the correlation is undefined. If your audience has any notion whatsoever of what correlation means, they will understand this. If not, trying to phrase the same argument in terms of math will not help; it will just make it impossible for your audience to engage with your argument.
If the audience is mathematically sophisticated, then writing out the formula for Pearson’s correlation coefficient is just going to distract them from the real issue, which is that the saying should refer to statistical dependence, rather than correlation. In other words, C’s argument only addresses the literal meaning of B’s words, not the substance behind them.
I acknowledge that using the wrong terminology to the wrong audience will make their eyes glaze over and be counter-productive.
I disagree about that. Until I actually took a course in statistics, I wouldn’t have been sure whether the correlation was undefined or just misleading in that case. Again, I agree that not everyone needs this level of precision.
An important issue, but a completely different one. If B said “that is statistical dependence, not causation”, wouldn’t they be equally wrong in exactly the same way?
B would be wrong in the exact same way. So the true reason that B is wrong needs to apply in both cases. On the other hand, appealing to the correlation formula only defeats the correlation version of the argument.
Ah, I see what you mean. You’re right.
Disagree. Our target audience—humans—rarely if ever thinks of ‘correlation’ in terms of its mathematical definition and I suspect would be put off by an attempt to do so.
This is entirely true—as a mere human, my interest plummeted at “covariance”, and I’d still like to think I’m SOMEWHAT equipped to handle correlation/causation. Just not numerically. So, as a roughly average human, I say your suspicions are correct.
The point still applies. What do you mean by “correlation”—formally or informally—when one (or both) of the variables is constant across the population?
The specific fake argument used is flawed because of that. When people make the correlation-causation error, how often are they doing it based off of a variable that’s constant across the population? Do people ever really develop ‘drinking water causes x’ beliefs?
It’s a valid point and very true, but I suspect that it isn’t applicable to the issue at hand.
Correlate water consumption rate with lifespan, to get a correlation. My guess is it will be negative.
Why? (EDIT: I guess people in warmer countries tend to drink more water but to have worse health; is that what you’re thinking about?)
Because babies drink less than adults. The lifetime average water consumption of people who die as infants is tiny compared to the lifetime average of adults.
In other words, death prevents drinking water.
Oops—sign mix-up in my mind when I wrote this. I meant the opposite—that I guessed that water consumption rate is negatively correlated with mortality rate.
Pet peeve:
The saying should be: “statistical dependence does not imply causality.” Correlation is a particular measure of a linear relationship. A lack of correlation can happily coexist with statistical dependence if variables are related in a complicated non-linear way. This “correlation” business needlessly emphasizes linear models (prevalent in Stat at the time of Pearson et al.) See also this: http://en.wikipedia.org/wiki/Correlation_and_dependence
Also, this is true: “lack of statistical dependence does not imply lack of causality” (due to effect cancellation).
I had intended on tackling this, but the original still works as a Steel Man of the general case.
Agreed. It’s just a pet peeve because the concept of “correlation” does not cut at the seams here. I was certainly not faulting you (as your post was likely a response to the slate article, which used the term “correlation” as well).
I was going to mention that “correlation does not imply causation” sounds snappier, but the more I play around with it, “association does not imply causation” seems somewhat more aesthetically appealing.
doesnotimply.com is also free, shorter, less loaded (in the way you describe above), and has an obvious logo to go with it (=/=>). If I go ahead with it, I might use that instead.
My vote goes to doesnotimp.ly
Somehow, the “correlation does not imply causation (but it furtively suggests it, etc)” idea is linked in my brain with the “absence of evidence is not evidence of absence (but it is if you’re a Bayesian)” idea.
At the risk of diluting the original good idea, maybe doesnotimply.com could incorporate the latter also.
They can also be completely independant and have causation, but that’s not something that would happen by chance. The only time I know of where something like that will happen is if the cause is designed to regulate whatever it’s independant of. For example, the temperature doesn’t correlate to the power going through the heater or air conditioner, since it’s always constant, which is because the heater and air conditioner keep it constant.
I agree, but this regulation business seems important and occurs a lot in nature.
I don’t have much time to think on this right now, but perhaps an Anti-Godwin’s law could be useful? Something along the lines of “just because your opponent made a simplistic analogy to Nazism, it does not follow that their overall argument is wrong”.
That sounds like a bite-sized refutation of the Worst Argument in the World.
More compactly this is called “Hitler Ate Sugar”.
Nice, I like that.
Warning
Obligatory link
Thanks for the link!
http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/
Content-related suggestion. Comics are great tools for people too lazy/busy to read long articles, so here’s XKCD’s take
Aren’t comics like that the source of the cached thought we’re trying to improve on here?
The mouseover text is important:
At the risk of being the village idiot today: how do people get this wrong? Point to an example or three?
I go to some lengths to avoid innumerate discussion online, but it still happens in real life with reasonable frequency. The flavours I seem to encounter most:
1) an all-purpose attempt to refute any statistical finding, even if said finding is not showing correlation, or proposing causation
2) dogged adherence to the belief that the direction of causal relationships are completely impossible to establish
3) the most perverse, that establishing an association between two variables is evidence against a causal relationship
Great answer, thanks. At the same time, I am reminded of gwern’s recent admonitions against “letting the better be the enemy of the good”:
I’m all in favor of giving the “correlation does not imply causation” meme better and more effective Internet visibility, perhaps with a friendly illustrated guide: “A simple explanation of association and causation”.
I’d be dead set against snarky content that runs even the slightest risk of making people feel dumb for knowing something that useful.
Your post is valuable enough that it’s worth it to edit it to avoid misinterpretations of your point (like IlyaShpitser’s in reply to Morendil above).
Look at almost any reporting of a scientific finding (even something like CNN gets this wrong usually). Chances are, the reporter will claim causality, while the original paper will claim association. This sort of thing is extremely common. Our good friend from slate was not immune to this, sadly.
I don’t get it. How does “journalist reports causation where there is only association” constitute an example of the journalist misapplying “correlation does not imply causation”, as opposed to failing to apply it in the first place?
(In the latter case, it’s good that the phrase “correlation does not imply causation” is floating around them Interwebs. The OP suggests that it’s not an unalloyed good because people “get it wrong”.)
Sorry, I didn’t catch the distinction between misapplying and failing to apply from your original phrasing (“people get this wrong.”)
To add to sixesandsevens list, I have one below, where people think lack of association implies lack of causation, or weak association implies weak causality (it does not in either case, due to effect cancellation, which happens reasonably frequently).
Isn’t the Faithfulness assumption the assumption that effect cancellation is rare enough to be ignored? If it happens frequently, that looks like a rather large problem for Pearl’s methods.
I currently have a paper in the submission process about systems which actively perform effect cancellation, and the problems that causes, but I assume that isn’t what you have in mind.
If you pick parameters of your causal model randomly, then almost surely the model will be faithful (formally, in Robins’ phrasing: “in finite dimensional parametric families, the subset of unfaithful distributions typically has Lebesgue measure zero on the parameter space”). People interpret this to mean that faithfulness violations are rare enough to be ignored. It is not so, sadly.
First, Nature doesn’t pick causal models randomly. In fact, cancellations are quite useful (homeostasis, and gene regulation are often “implemented” by faithfulness violations).
Second, we may have a model that is weakly faithful (that is hard to tell from unfaithful with few samples). What is worse is it is difficult to say in advance how many samples one would need to tell apart a faithful vs an unfaithful model. In statistical terms this is sometimes phrased as the existence of “pointwise consistent” but the non-existence of “uniformly consistent” tests.
I suggest the following paper for more on this:
http://www.hss.cmu.edu/philosophy/scheines/uniform-consistency.pdf
See also this (the distinction comes from analysis): http://en.wikipedia.org/wiki/Uniform_convergence
Kevin Kelly at CMU thinks a lot about “having to change your mind” due to lack of uniform consistency.
Much of what Pearl et al do (identification of causal effects, counterfactual reasoning, actual cause, etc.) does not rely on faithfulness. Faithfulness typically comes up when one wishes to learn causal structure from data. Even in this setting there exist methods which do not require faithfulness (I think the LiNGAM algorithm does not).
The very subject of my paper. I don’t think the magnitude of the obstacle has yet been fully appreciated by people who are trying to extend methods of causal discovery in that direction. And in the folklore, there are frequent statements like this one, which is simply false:
(Edward Tufte, quoted here.)
I think the way causal discovery is sold sometimes is not as a way of establishing causal structure from data, but as a way of narrowing down the set of experiments one would have to run to establish causal structure definitively, in domains which are poorly understood but in which we can experiment (comp. bio., etc).
If phrased in this way, assuming faithfulness is not “so bad.” It is true that many folks in causal inference and related areas are quite skeptical of faithfulness type assumptions and rightly so. To me, it’s the lack of uniform consistency that’s the real killer.
In Part II of this talk (http://videolectures.net/uai2011_shpitser_causal/) I gave is an example of how you can do completely ridiculous, magical things if you assume a type of faithfulness. See 31:07.
Key to Memetic Value:
Make sure the landing page is simple, to the point, with no necessary scrolling to get the entire message in a matter of only a few moments, and without clutter. Perhaps include a simple, clear diagram—but that’s not necessary, as long as you have a simple, brief textual explanation that dominates the page. Include a small number of obvious links to other pages on your site for additional information if you want to go into greater detail. If you want to include links to off-site resources, they should probably be collected on a single page other than the main page, unless you do not intend to ever have more than one off-site link for such information. Make sure the page is still clear and quickly absorbed by visitors even with JavaScript and CSS turned off in the browser. Whatever you do, don’t use Flash, Java, or any kind of animation or video on the main page. None.
Awesome idea.
As far as I understand it, if variables A and B are correlated, then we can be pretty damn sure that either:
A causes B
B causes A
there’s a third variable affecting both A and B.
(Am I right about this or is this an oversimplification?)
A good way to grab attention might be to deny a commonly believed fact in a way that promises intelligent elaboration. So the website could start with a huge ‘Correlation does not imply causation’ banner and then go like ‘well, actually, it kind of does’. And then explain how going from not knowing anything at all to knowing that one of three causal hypotheses is correct is pretty damn informative even if we don’t immediately know which of the hypotheses is correct.
Then it would probably be useful to go all Bayesian and talk about priors, Ockham’s razor and how it’s a rare situation where we cannot distinguish between hypotheses at all. A good example might be to tell the story of how R. A. Fisher used the ‘correlation does not imply causation’ platitude to shoot down research connecting smoking to lung cancer and explain that it should have been clear that the hypothesis ‘smoking causes cancer’ was much more reasonable at that time than the hypothesis ‘there’s a common factor causing both smoking and cancer’. (On the other hand, this could turn political. I don’t know whether the smoking and lung cancer issue is still contested.)
There is in fact a d) A and not-B both can cause some condition C that defines our sample.
Example: Sexy people are more likely to be hired as actors. Good actors are also more likely to be hired as actors. So if we look at “people who are actors,” then we’ll get people who are sexy but can’t really act, people who are sexy and can act, and people who can act and aren’t really sexy. If sexiness and acting ability are independent, these three groups will be about equally full.
Thus if we look at actors in general in our simple model, 2⁄3 of them will be sexy and 2⁄3 of them will be good actors. But of the ones who are sexy, only 1⁄2 will be good actors. So being sexy is correlated with being a bad actor! Not because sexiness rots your brain (a), or because acting well makes you ugly (b), and not because acting classes cause both good acting and ugliness, or diet pills cause both beauty and bad acting (c). Instead, it’s just because how we picked actors made sexiness and acting ability “compete for the same niche.”
Similar examples would be sports and academics in college, different sorts of skills in people promoted in the workplace, UI design versus functionality in popular programs, and so on and so on.
I feel like this example should go on the doesnotimply website.
If you are familiar with d-separation (http://en.wikipedia.org/wiki/D-separation#d-separation), we have:
if A is dependent on B, and there’s some unobserved C involved, then:
(1) A ← C → B, or
(2) A → C → B, or
(3) A ← C ← B
(this is Reichenbach’s common cause principle: http://plato.stanford.edu/entries/physics-Rpcc/)
or
(4) A → C ← B
if C or its effect attains a particular (not necessarily recorded) value. Statisticians know this as Berkson’s bias, which is a form of selection bias. In AI, this is known as “explaining away.” Manfred’s excellent example falls into category (4), with C observed to equal “hired as actor.”
Beware: d-separation applies to causal graphical models, and Bayesian networks (which are statistical and not causal models). The meaning of arrows is different in these two kinds of models. This is actually a fairly subtle issue.
Odd—I always felt like d-separation was the same thing on causal diagrams and on Bayes networks. Although, I also understood Bayes network as being a model of the causal directions in a situation, so perhaps that’s why.
Manfred’s excellent example needs equally excellent counterparts for other possibilities.
Sorry for not being clear. The d-separation criterion is the same in both Bayesian networks and causal diagrams, but its meaning is not the same. This is because an arrow A → B in a causal diagram means (loosely) that A is a direct cause of B at the level of granularity of the model, while an arrow A → B in a Bayesian network has a more complicated to explain meaning having to do with the Markov factorization and conditional independence. D-separation talks about arrows in both cases, but asserts different things due to a difference in the meaning of those arrows.
A Bayesian network model is just a statistical model (a set of joint distributions) associated with a directed acyclic graph. Specifically it’s all distributions p(x1, …, xk) that factorize as a product of terms of the form p(x_i | parents(x_i)). Nothing more, nothing less. Nothing about causality in that definition.
I think examples for (1),(2),(3) are simpler than Manfred’s Berkson’s bias example.
(1) A ← C → B
Most clearly non-causal associations go here: “shoe size correlates with IQ” and its kin.
(2) A → C → B, and (3) A ← C ← B
Classic scientific triumphs go here: “smoking causes cancer.” Of note here is that if we can find an observable unconfounded C that intercepts all/most of the causal pathway, this is extremely valuable for estimating effects. If you can design an experiment with such a C, you don’t even have to randomize A.
That’s known as Berkson’s paradox.
I first heard of this idea a few months ago in a blog post at The Atlantic.
Aha, yes—which I think I in turn was linked to by Ben Goldacre. But the reason I was quickly able to enumerate this as a separate kind of correlation is because the causal graph is different, which would be Judea Pearl.
Yup. I’m reading the link from this post and just got to the discussion of Berkson’s paradox, which seems to be the same effect.
What do you mean by “equally full”?
I mean “I’m about to pretend that ‘sexy’ and ‘good actor’ are binary variables centered to make the math super easy.” If you would like less pretending, read the Atlantic article linked by a thoughtful replier, since the author draws the nice graph to prove the general case.
I wouldn’t like less pretending and ‘sexy’/‘good actor’ being binary variables is fine with me (and I understand your comment overall), but still I don’t know what does it mean that the groups are equally full. (Equal size? That doesn’t follow from independence.)
Right, so I make the math-light but false assumption that casting directors will take above-average applicants, and also that you aren’t more likely to eventually become an actor if you’re sexy and can act well.
If you mean “above median”, I see.
There’s also e): A causes B within our sample, but A does not cause B generally, or in the sense that we care about.
For example, suppose a teacher gives out a gold star whenever a pupil does a good piece of work, and this causes the pupil to work harder. Suppose also that this effect is greatest on mediocre pupils and least on the best pupils—but the best pupils get most of the gold stars, naturally.
Now suppose an educational researcher observes the class, and notes the correlation between receiving a gold star, and increased effort. This is genuine causation. He then concludes that the teacher should give out more gold stars, regardless of whether the pupil does a good piece of work or not, and focus the stars on mediocre pupils. This change made, the gold stars no longer cause increased effort. The causation disappears! Changing the way the teacher hands out the gold stars changes the relationship between gold stars and effort. So although there was genuine causation in the original sample, there is no general causation, or causation in the sense we care about; we can’t treat the gold stars as an exogenous variable.
See also the Lucas Critique.
That’s because you have cause and effect reversed: The extra effort causes the gold stars, not the other way around.
No, the gold stars cause extra effort after they are given out. This is part of the hypothetical.
The pupils work harder after they are given a gold star because they see their good work is appreciated. But if the gold stars are given out willy-nilly, then the pupils no longer feel proud to get one, and so they lose their ability to make students work harder.
As Robert Lucas would put it, the relationship is not robust to changes in the policy regime.
If the gold stars are what is causing the hard work in the hypothetical, then the hypothetical policy of giving out more gold stars would work. If giving out more gold stars doesn’t improve work, then distribute the gold stars to the students who do the least work- that ones with the most room to improve.
If, on the other hand, gold stars are a proxy for recognition, then students who want recognition have an extra incentive to work hard. Giving out more gold stars dilutes the effect, and distributing them according to some other criteria than ‘who put in hard work on this assignment’ also reduces the effect. The cause of the extra hard work isn’t the gold stars, but the method by which gold stars are distributed.
This might me correlated http://www.slate.com/articles/health_and_science/science/2012/10/correlation_does_not_imply_causation_how_the_internet_fell_in_love_with_a_stats_class_clich_.single.html
I like how this guy complains about “internet blowhards” abusing this phrase, but then gets basic reporting of correlation in scientific papers wrong (by reporting causation in articles he wrote about those papers). There was a comment on that slate article calling him out on this.
Since it was the response to a quote from this article that gave me the idea, I’m going to infer that it’s causal.
Haha. This is the best idea I’ve heard today. I’ll help if there’s anything I can add.
lots of smart people argue about dumb things for irrational reasons on 4chan and reddit...
That might be a good place to dump links to succinct and engaging explanations.