[Link] Correlation Graphs Reveal Shocking Information
Babies named Ava caused the housing bubble, and other intriguing data.
More illustrative than the usual “correlation is not causation” mantra.
Babies named Ava caused the housing bubble, and other intriguing data.
More illustrative than the usual “correlation is not causation” mantra.
I do worry sometimes that the pendulum has swung too far in the other direction, and that people are starting using correlation-causation as an I’m-smarter-than-you sort of status signal—that is, once people pass a certain intelligence level I worry less about them claiming Facebook causes the Greek debt crisis because they’re correlated, and more about them hearing a very well-conducted study showing an r = .98 correlation between some disease and some risk factor, and instead of agreeing we should investigate further they just say “HA! GOTCHA! CORRELATION’S NOT THE SAME THING AS CAUSATION!”
I mean, I admit it’s an important lesson, as long as people remember it’s just a caution against being too certain of a causal relationship, and not a guarantee that a correlation provides absolutely no evidence.
This seems crucial to me; you’re really talking about a few percent of the population, right?
Also, I’ll note that when (even very smart) people are motivated to believe in the existence of a phenomenon they’re apt to attribute causal structure in.correlated data.
For example: It’s common wisdom among math teachers that precalculus is important preparation for calculus. Surely taking precalculus has some positive impact on calculus performance but I would guess that this impact is swamped by preexisting variance in mathematical ability/preparation.
Do you have any particular reason to think that this is likely to be a problem?
Personal observation.
Fisher’s denial that smoking contributed to lung cancer.
I strongly suggest you read one of Fisher’s articles on the subject. Fisher did not deny that smoking contributes to lung cancer, just argued that the Hill and Doll reports failed to establish a causal link. He argued that the negative correlation between cancer and inhaling, the rate of increase in lung cancer incidence for each sex not matching the rates of smoking adoption for each sex, the high correlation with lung cancer for heavy cigarette smoking but not cigar or pipe smoking, and the correlation between lung cancer incidence and urban location all discount the hypothesis that cancer results from tobacco combustion products passing through the lungs in favor of other hypotheses. He did not claim that causality can not be established, and indeed proposed experiments to distinguish between some of the alternate explanations.
I was mostly going to say (1), but (2) certainly crossed my mind as an example of the other sort of error.
I don’t think these are very good examples. Those lines hardly look correlated, let alone casually related. I once read an article with a much better example, but I can’t find it now. It first talked about how if you looked through enough examples you could find any correlation, and then showed a very closely correlated graph of the stock market versus something about Venus, like its surface temperature or distance from the sun or something.
You can easily generate correlation examples with Google Correlate, such as how AppleWorks is causing the decline of the Japanese language.
My microeconometrics professor used to show off his icecream consumption versus drownings dataset that could pass all the significance tests he would be teaching that semester. That one always stuck with me.
Is that the best example to use, though? Ideally to promote skepticism you want correlations which are the result of sifting through mountains of data for coincidences, or correlations where the only underlying causation is something grossly general like “things often change monotonically for decades as time advances”. With “ice cream consumption versus drownings”, I wouldn’t be surprised if there’s a real, specific common factor: high temperatures motiving people to eat more cold treats and go swimming more often.
They don’t look that bad compared to the sorts of correlations one gets in messy data. The Facebook-Greek Debt one looks like something I wouldn’t be surprised to see for a genuine correlation with messy, real world data.
I don’t in any way intend this as a criticism, but Mark Twain’s complaint about bringing up the weather in conversation seems to apply here: everyone talks about it, but nobody ever does anything about it.
The problem with just saying “correlation is not causation” is that it doesn’t help you figure out what information you can get from an observational study. This is becoming an important issue in my work. We do advanced marketing research for (mostly) large companies. For many years the company’s emphasis was on choice-based conjoint studies, which give you experimental data. (You ask people to repeatedly choose among various sets of hypothetical products, and then analyze the results to figure out what they value.) Now we’re moving more into marketing mix models, which involve purely observational data. Knowing exactly what one can and cannot legitimately infer from observational data, under what assumptions, is a very important practical question for us and our clients.
That’s why I am currently spending a lot of my spare time studying Judea Pearl’s book Causality. I would highly recommend this book to anyone who, as Yvain suggests, is more interested in solving problems than looking smart.
It’s these fallacies that make reading newspapers almost impossible without becoming incredibly frustrated incredible quickly. I imagine the first media editor that realised they could miscontrue data to support their articles almost collapsed with excitement.
“Oh my god guys! Look, now we can weaponize our hyperbole!!”
LW needs a (Funny) tag like Slashdot. I’m saving this for future use in dispelling the correlation/causation fallacy.