Causality is rare! The usual statement that “correlation does not imply causation” puts them, I think, on deceptively equal footing. It’s really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.
Over the past few years I’d gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there’s a good first principles reason for this.
For each true cause of some outcome we care to influence, there are many other “measurables” that correlate to the true cause but, by default, have no impact on our outcome of interest. Many of these measures will (weakly) correlate to the outcome though, via their correlation to the true cause. So there’s a one-to-many relationship between the true cause and the non-causal correlates. Therefore, if all you know is that something correlates with a particular outcome, you should have a strong prior against that correlation being causal.
My thinking previously was along the lines of p-hacking: if there are many things you can test, some of them will cross a given significance threshold by chance alone. But I’m claiming something more specific than that: any true cause is bound to be correlated to a bunch of stuff, which will therefore probably correlate with our outcome of interest (though more weakly, and not guaranteed since correlation is not necessarily transitive).
The obvious idea of requiring a plausible hypothesis for the causation helps somewhat here, since it rules out some of the non-causal correlates. But it may still leave many of them untouched, especially the more creative our hypothesis formation process is! Another (sensible and obvious, that maybe doesn’t even require agreement with the above) heuristic is to distrust small (magnitude) effects, since the true cause is likely to be more strongly correlated with the outcome of interest than any particular correlate of the true cause.
Compilation of studies comparing observational results with randomized experimental results on the same intervention, compiled from medicine/economics/psychology, indicating that a large fraction of the time (although probably not a majority) correlation ≠ causality.
Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like ‘we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B’, this is a lot closer to the truth than thinking that ‘a third of the time, A causes B; a third of the time, B causes A; and the other third of the time, it’s just some confounder’.
For example, only children are nearly twice as likely to be Presbyterian than Baptist in Minnesota, more than half of the Episcopalians “usually like school” but only 45% of Lutherans do, 55% of Presbyterians feel that their grades reflect their abilities as compared to only 47% of Episcopalians, and Episcopalians are more likely to be male whereas Baptists are more likely to be female.
Like, if you randomly assigned Baptist children to be converted to Presbyterianism, it seems unlikely that their school-liking will suddenly jump because they go somewhere else on Sunday, or that siblings will appear & vanish; it also seems unlikely that if they start liking school (maybe because of a nicer principal), that many of those children would spontaneously convert to Presbyterianism. Similarly, it seems rather unlikely that undergoing sexual-reassignment surgery will make Episcopalian men and Baptist women swap places, and it seems even more unlikely that their religious status caused their gender at conception. In all of these 5 cases, we are pretty sure that we can rule out one of the direct patterns, and that it was probably the third, and we could go through the rest of Meehl’s examples. (Indeed, this turns out to be a bad example because we can apply our knowledge that sex must have come many years before any other variable like “has cold hands” or “likes poetry” to rule out one pattern, but even so, we still don’t find any 50%s: it’s usually pretty obviously direct causation from the temporally earlier variable, or confounding, or both.)
So what I am doing in ‘How Often Does Correlation=Causality?’ is testing the claim that “yes, of course it would be absurd to take pairs of arbitrary variables and calculate their causal patterns for prior probabilities, because yeah, it would be low, maybe approaching 0 - but that’s irrelevant because that’s not what you or I are discussing when we discuss things like medicine. We’re discussing the good correlations, for interventions which have been filtered through the scientific process. All of the interventions we are discussing are clearly plausible and do not require time travel machines, usually have mechanisms proposed, have survived sophisticated statistical analysis which often controls for covariates or confounders, are regarded as credible by highly sophisticated credentialed experts like doctors or researchers with centuries of experience, and may even have had quasi-randomized or other kinds of experimental evidence; surely we can repose at least, say, 90% credibility, by the time that some drug or surgery or educational program has gotten that far and we’re reading about it in our favorite newspaper or blog? Being wrong 1 in 10 times would be painful, but it certainly doesn’t justify the sort of corrosive epistemological nihilism you seem to be espousing.”
But unfortunately, it seems that the error rate, after everything we humans can collectively do, is still a lot higher than 1 in 10 before the randomized version gets run. (Which implies that the scientific evidence is not very good in terms of providing enough Bayesian evidence to promote the hypothesis from <1% to >90%, or that it’s <<1% because causality is that rare.)
Thanks for these references! I’m a big fan, but for some reason your writing sits in the silly under-exploited part of my 2-by-2 box of “how much I enjoy reading this” and “how much of this do I actually read”, so I’d missed all of your posts on this topic! I caught up with some of it, and it’s far further along than my thinking. On a basic level, it matches my intuitive model of a sparse-ish network of causality which generates a much much denser network of correlation on top of it. I too would have guessed that the error rate on “good” studies would be lower!
Causality is rare! The usual statement that “correlation does not imply causation” puts them, I think, on deceptively equal footing. It’s really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.
Over the past few years I’d gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there’s a good first principles reason for this.
For each true cause of some outcome we care to influence, there are many other “measurables” that correlate to the true cause but, by default, have no impact on our outcome of interest. Many of these measures will (weakly) correlate to the outcome though, via their correlation to the true cause. So there’s a one-to-many relationship between the true cause and the non-causal correlates. Therefore, if all you know is that something correlates with a particular outcome, you should have a strong prior against that correlation being causal.
My thinking previously was along the lines of p-hacking: if there are many things you can test, some of them will cross a given significance threshold by chance alone. But I’m claiming something more specific than that: any true cause is bound to be correlated to a bunch of stuff, which will therefore probably correlate with our outcome of interest (though more weakly, and not guaranteed since correlation is not necessarily transitive).
The obvious idea of requiring a plausible hypothesis for the causation helps somewhat here, since it rules out some of the non-causal correlates. But it may still leave many of them untouched, especially the more creative our hypothesis formation process is! Another (sensible and obvious, that maybe doesn’t even require agreement with the above) heuristic is to distrust small (magnitude) effects, since the true cause is likely to be more strongly correlated with the outcome of interest than any particular correlate of the true cause.
This seems pretty different from Gwern’s paper selection trying to answer this topic in How Often Does Correlation=Causality?, where he concludes
Also see his Why Correlation Usually ≠ Causation.
Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like ‘we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B’, this is a lot closer to the truth than thinking that ‘a third of the time, A causes B; a third of the time, B causes A; and the other third of the time, it’s just some confounder’.
What would be relevant there is “Everything is Correlated”. If you look at, say, Meehl’s examples of correlations from very large datasets, and ask about causality, I think it becomes clearer. Let’s take one of his first examples:
Like, if you randomly assigned Baptist children to be converted to Presbyterianism, it seems unlikely that their school-liking will suddenly jump because they go somewhere else on Sunday, or that siblings will appear & vanish; it also seems unlikely that if they start liking school (maybe because of a nicer principal), that many of those children would spontaneously convert to Presbyterianism. Similarly, it seems rather unlikely that undergoing sexual-reassignment surgery will make Episcopalian men and Baptist women swap places, and it seems even more unlikely that their religious status caused their gender at conception. In all of these 5 cases, we are pretty sure that we can rule out one of the direct patterns, and that it was probably the third, and we could go through the rest of Meehl’s examples. (Indeed, this turns out to be a bad example because we can apply our knowledge that sex must have come many years before any other variable like “has cold hands” or “likes poetry” to rule out one pattern, but even so, we still don’t find any 50%s: it’s usually pretty obviously direct causation from the temporally earlier variable, or confounding, or both.)
So what I am doing in ‘How Often Does Correlation=Causality?’ is testing the claim that “yes, of course it would be absurd to take pairs of arbitrary variables and calculate their causal patterns for prior probabilities, because yeah, it would be low, maybe approaching 0 - but that’s irrelevant because that’s not what you or I are discussing when we discuss things like medicine. We’re discussing the good correlations, for interventions which have been filtered through the scientific process. All of the interventions we are discussing are clearly plausible and do not require time travel machines, usually have mechanisms proposed, have survived sophisticated statistical analysis which often controls for covariates or confounders, are regarded as credible by highly sophisticated credentialed experts like doctors or researchers with centuries of experience, and may even have had quasi-randomized or other kinds of experimental evidence; surely we can repose at least, say, 90% credibility, by the time that some drug or surgery or educational program has gotten that far and we’re reading about it in our favorite newspaper or blog? Being wrong 1 in 10 times would be painful, but it certainly doesn’t justify the sort of corrosive epistemological nihilism you seem to be espousing.”
But unfortunately, it seems that the error rate, after everything we humans can collectively do, is still a lot higher than 1 in 10 before the randomized version gets run. (Which implies that the scientific evidence is not very good in terms of providing enough Bayesian evidence to promote the hypothesis from <1% to >90%, or that it’s <<1% because causality is that rare.)
Thanks for these references! I’m a big fan, but for some reason your writing sits in the silly under-exploited part of my 2-by-2 box of “how much I enjoy reading this” and “how much of this do I actually read”, so I’d missed all of your posts on this topic! I caught up with some of it, and it’s far further along than my thinking. On a basic level, it matches my intuitive model of a sparse-ish network of causality which generates a much much denser network of correlation on top of it. I too would have guessed that the error rate on “good” studies would be lower!