What I would like to see is a “meta” study examining the rate at which causal claims originally supported only by correlational studies are later found to be spurious when tested experimentally. It’s hard to calibrate one’s skepticism towards correlational studies without knowing what the relevant base rates are.
Results Of 49 highly cited original clinical research studies, 45 claimed that the intervention was effective. Of these, 7 (16%) were contradicted by subsequent studies, 7 others (16%) had found effects that were stronger than those of subsequent studies, 20 (44%) were replicated, and 11 (24%) remained largely unchallenged. Five of 6 highly-cited nonrandomized studies had been contradicted or had found stronger effects vs 9 of 39 randomized controlled trials (P = .008). Among randomized trials, studies with contradicted or stronger effects were smaller (P = .009) than replicated or unchallenged studies although there was no statistically significant difference in their early or overall citation impact. Matched control studies did not have a significantly different share of refuted results than highly cited studies, but they included more studies with “negative” results.
In the history of medicine, 5 valuable facts about the causes of disease have been established by correlation: smoking ⇒ lung cancer, heart disease; sun ⇒ skin cancer; HPV is STD; alcohol ⇒ rectal cancer. The claim with the 6th most correlation evidence is that alcohol protects from heart disease, but this is controversial.
I’ll leave it to you to compute the denominator.
Also, the number 5 is itself controversial. Some people put it at 2.
I think it’s way higher. Some off the top of my head (with a little reading to confirm details):
Child delivery by doctors in a hospital correlated with puerperal fever. Refined to a correlation between child delivery by someone who had recently performed an autopsy and puerperal fever. Experimentally testing handwashing (though not blind) confirmed effect, doctors wash their hands, dying in childbirth is now less common.
A student with anemia symptoms turns out to have some strangely shaped blood cells. This initial association is expanded by people looking at blood of other people with anemia, and several others also have these elongated red cells. Eventually we get enough of a correlation that we’ve discovered sickle cell anemia.
In fact I would go as far as to say that most of our medical knowledge comes from correlations, often relatively obvious ones like “getting run over by a car increases your chance of death”.
There may still be something here, though: the kinds of studies we see with bad correlations being misleading, and the examples you give of successful ones, are generally small effects compared to the amount of time involved. Can we characterize better this area where correlations are especially suspect?
In fact I would go as far as to say that most of our medical knowledge comes from correlations, often relatively obvious ones like “getting run over by a car increases your chance of death”.
Well, we have to be careful about definitions here. People generally don’t talk about correlations when there is a known underlying mechanism.
I guess technically the phrase should look like this: Correlation by itself without known connecting mechanisms or relationships does not imply causation.
Correlation by itself without known connecting mechanisms or relationships does not imply causation.
The bayesian approach would suggest that we assign a causation-credence to every correlation we observe. Of course detecting confounders is very important since it provides you with updates. However, a correlation without known connecting mechanisms does imply causation. In particular it does it probabilistically. A bayesian updater would prefer talking about credences in causation which can be shifted up and downwards. It would be a (sometimes dangerous) simplification to in our map deal with discrete values like “just correlation” and “real causation”. However, such a simplification may be of use as a heuristic in everyday life, still I’d suggest not to overgeneralize it.
Correlation by itself without known connecting mechanisms or relationships does not imply causation
This does separate out the “getting run over by a car” case, but it doesn’t handle the handwashing one. Germ theory hadn’t been invented yet and Semelweiss’ proposed mechanism was both medically unlikely and wrong. With sickle cell anemia it kind of handles it, in that you can think of all sorts of ways weirdly shaped blood cells might be a problem, but I think it’s a stretch to say that the first people looking at the blood and saying “that’s weird, it’s probably the problem” understood the “connecting mechanisms or relationships”.
More generally, correlation is some evidence and if it’s not expected someone should probably look more closely to try to understand why we’re seeing it, which generally means some kind of controlled experiment.
Well, to start with correlation is data. This data might be used to generate hypotheses. Once you have some hypotheses you can start talking about evidence and yes, correlation can be promoted to the rank of evidence supporting some hypothesis.
I don’t think any of that is controversial. The only point is that pure correlation without anything else is pretty weak evidence, that’s all. However if you want to use it to generate hypotheses, sure, no problems with it whatsoever.
Can we characterize better this area where correlations are especially suspect?
Epidemiological studies of diets (that is, health consequences of particular patterns of food intake) are all based on correlations and the great majority of them is junk.
These days epi people mostly use g methods which are not junk (or rather, give correct answers given assumptions they make, and are quite a bit more sophisticated than just using conditional probabilities). How much epi do you know?
edit: Correction: not everyone uses g methods. There is obviously the “changing of the guard” issue. But g methods are very influential now. I also agree there is a lot of junk in data analysis. But I think the “junk” issue is not always (or even usually) due to the fact that the study was “based on correlations” (you are not being precise about what you mean here, but I interpreted you to mean that “people are not using correct methods for getting causal conclusions from observational data.”)
Not much. I’ve read a bunch of papers and some critiques… And I’m talking not so much about the methods as about the published claims and conclusions. Sophisticated methods are fine, the issue is their fragility. And, of course, you can’t correct for what you don’t know.
Thinking about why things do or don’t belong on your list, and I think they basically have to be very harmful. If they’re good we do an experiment and find out, but if they’re bad we just declare it established with a correlation. For example I think Thalidomide could go on your list, in that the evidence was basically “people who took Thalidomide were far more likely to have babies with major birth defects.” Probably lead (paint, fuel)? Our sense of what’s a safe dose of radiation?
I don’t know whether the last one is valuable. It might have resulted in a lot people getting less sun than healthy for them because of Vitamin D production.
And I gave both positive and negative effects of alcohol. So what?
By “valuable” I mean an easily manipulable causal mechanism that explains a relatively large amount of the population variance of health. I don’t mean that it has actually been manipulated, let alone manipulated correctly. And I certainly don’t mean that this is all we know about medicine. We understand vitamin D and inebriation because of experiments.
There are lots of ways to gain knowledge other than by looking at correlations. For example you can run experiments. There was a guy named Edward Jenner who was interested in avoiding smallpox. He ran an experiment and it worked. The world learned how to avoid smallpox and there were no correlations in sight...
At the age of 13, Jenner was apprenticed to Dr. Ludlow in Sodbury. He observed that people who caught cowpox while working with cattle were known not to catch smallpox. He assumed a causal connection. The idea was not taken up by Dr. Ludlow at that time. After Jenner returned from medical school in London, a smallpox epidemic struck his home town of Berkeley, England. When he advised the local cattle workers to be inoculated, the farmers told him that cowpox prevented smallpox. This confirmed his childhood suspicion, and he studied cowpox further, presenting a paper on it to his local medical society.
Saying “He ran an experiment and it worked” hides the initial correlational observation that let him to try that experiment.
I think so. If you want to separate them how would you say “people who get pustules from working with cattle are less likely to catch smallpox” differs from “people who give blood are less likely to have heart disease”?
What I would like to see is a “meta” study examining the rate at which causal claims originally supported only by correlational studies are later found to be spurious when tested experimentally. It’s hard to calibrate one’s skepticism towards correlational studies without knowing what the relevant base rates are.
There are a few, but not many such studies, for obvious reasons. I list at least one in http://www.gwern.net/DNB%20FAQ#flaws-in-mainstream-science-and-psychology and IIRC, the correlation->causation rate was <10%.
“Contradicted and Initially Stronger Effects in Highly Cited Clinical Research” (Ioannidis 2005) is helpful:
In the history of medicine, 5 valuable facts about the causes of disease have been established by correlation: smoking ⇒ lung cancer, heart disease; sun ⇒ skin cancer; HPV is STD; alcohol ⇒ rectal cancer. The claim with the 6th most correlation evidence is that alcohol protects from heart disease, but this is controversial.
I’ll leave it to you to compute the denominator.
Also, the number 5 is itself controversial. Some people put it at 2.
I think it’s way higher. Some off the top of my head (with a little reading to confirm details):
Child delivery by doctors in a hospital correlated with puerperal fever. Refined to a correlation between child delivery by someone who had recently performed an autopsy and puerperal fever. Experimentally testing handwashing (though not blind) confirmed effect, doctors wash their hands, dying in childbirth is now less common.
A student with anemia symptoms turns out to have some strangely shaped blood cells. This initial association is expanded by people looking at blood of other people with anemia, and several others also have these elongated red cells. Eventually we get enough of a correlation that we’ve discovered sickle cell anemia.
In fact I would go as far as to say that most of our medical knowledge comes from correlations, often relatively obvious ones like “getting run over by a car increases your chance of death”.
There may still be something here, though: the kinds of studies we see with bad correlations being misleading, and the examples you give of successful ones, are generally small effects compared to the amount of time involved. Can we characterize better this area where correlations are especially suspect?
Well, we have to be careful about definitions here. People generally don’t talk about correlations when there is a known underlying mechanism.
I guess technically the phrase should look like this: Correlation by itself without known connecting mechanisms or relationships does not imply causation.
The bayesian approach would suggest that we assign a causation-credence to every correlation we observe. Of course detecting confounders is very important since it provides you with updates. However, a correlation without known connecting mechanisms does imply causation. In particular it does it probabilistically. A bayesian updater would prefer talking about credences in causation which can be shifted up and downwards. It would be a (sometimes dangerous) simplification to in our map deal with discrete values like “just correlation” and “real causation”. However, such a simplification may be of use as a heuristic in everyday life, still I’d suggest not to overgeneralize it.
This does separate out the “getting run over by a car” case, but it doesn’t handle the handwashing one. Germ theory hadn’t been invented yet and Semelweiss’ proposed mechanism was both medically unlikely and wrong. With sickle cell anemia it kind of handles it, in that you can think of all sorts of ways weirdly shaped blood cells might be a problem, but I think it’s a stretch to say that the first people looking at the blood and saying “that’s weird, it’s probably the problem” understood the “connecting mechanisms or relationships”.
More generally, correlation is some evidence and if it’s not expected someone should probably look more closely to try to understand why we’re seeing it, which generally means some kind of controlled experiment.
Well, to start with correlation is data. This data might be used to generate hypotheses. Once you have some hypotheses you can start talking about evidence and yes, correlation can be promoted to the rank of evidence supporting some hypothesis.
I don’t think any of that is controversial. The only point is that pure correlation without anything else is pretty weak evidence, that’s all. However if you want to use it to generate hypotheses, sure, no problems with it whatsoever.
Are you using Semelweiss as an example of the medical community properly assessing and synthesizing data?
I’m using it as an example of a valuable fact about disease being established by correlation.
Your paragraph speaks about correlation providing a hypothesis while the “fact about disease” was established by an experimental intervention study.
I think we’re getting into a discussion about what it means for something to be established as a fact, which doesn’t sound very useful.
Epidemiological studies of diets (that is, health consequences of particular patterns of food intake) are all based on correlations and the great majority of them is junk.
These days epi people mostly use g methods which are not junk (or rather, give correct answers given assumptions they make, and are quite a bit more sophisticated than just using conditional probabilities). How much epi do you know?
edit: Correction: not everyone uses g methods. There is obviously the “changing of the guard” issue. But g methods are very influential now. I also agree there is a lot of junk in data analysis. But I think the “junk” issue is not always (or even usually) due to the fact that the study was “based on correlations” (you are not being precise about what you mean here, but I interpreted you to mean that “people are not using correct methods for getting causal conclusions from observational data.”)
Not much. I’ve read a bunch of papers and some critiques… And I’m talking not so much about the methods as about the published claims and conclusions. Sophisticated methods are fine, the issue is their fragility. And, of course, you can’t correct for what you don’t know.
Thinking about why things do or don’t belong on your list, and I think they basically have to be very harmful. If they’re good we do an experiment and find out, but if they’re bad we just declare it established with a correlation. For example I think Thalidomide could go on your list, in that the evidence was basically “people who took Thalidomide were far more likely to have babies with major birth defects.” Probably lead (paint, fuel)? Our sense of what’s a safe dose of radiation?
I don’t know whether the last one is valuable. It might have resulted in a lot people getting less sun than healthy for them because of Vitamin D production.
And I gave both positive and negative effects of alcohol. So what?
By “valuable” I mean an easily manipulable causal mechanism that explains a relatively large amount of the population variance of health. I don’t mean that it has actually been manipulated, let alone manipulated correctly. And I certainly don’t mean that this is all we know about medicine. We understand vitamin D and inebriation because of experiments.
Um. Really?
I’m no expert, but this sounds way off. So, we know essentially nothing about how to avoid disease, apart from these 5 or 6 (or 2) causes?
There are lots of ways to gain knowledge other than by looking at correlations. For example you can run experiments. There was a guy named Edward Jenner who was interested in avoiding smallpox. He ran an experiment and it worked. The world learned how to avoid smallpox and there were no correlations in sight...
Wikipedia:
Saying “He ran an experiment and it worked” hides the initial correlational observation that let him to try that experiment.
It seems to me that you want to call all observational data “correlations”.
I think so. If you want to separate them how would you say “people who get pustules from working with cattle are less likely to catch smallpox” differs from “people who give blood are less likely to have heart disease”?
The relevant base rates are likely to be much different between different subject domains.