Too good to be true

A friend recently posted a link on his Facebook page to an informational graphic about the alleged link between the MMR vaccine and autism. It said, if I recall correctly, that out of 60 studies on the matter, not one had indicated a link.

Presumably, with 95% confidence.

This bothered me. What are the odds, supposing there is no link between X and Y, of conducting 60 studies of the matter, and of all 60 concluding, with 95% confidence, that there is no link between X and Y?

Answer: .95 ^ 60 = .046. (Use the first term of the binomial distribution.)

So if it were in fact true that 60 out of 60 studies failed to find a link between vaccines and autism at 95% confidence, this would prove, with 95% confidence, that studies in the literature are biased against finding a link between vaccines and autism.

In reality, you should adjust your literature survey for known biases of literature. Scientific literature has publication bias, so that positive results are more likely to be reported than negative results.

They also have a bias from errors. Many articles have some fatal flaw that makes their results meaningless. If the distribution of errors is random, I think—though I’m not sure—that we should assume this bias causes regression towards an equal likelihood of positive and negative results.

Given that both of these biases should result, in this case, in more positive results, having all 60 studies agree is even more incredible.

So I did a quick mini-review this morning, looking over all of the studies cited in 6 reviews on the results of studies on whether there is a connection between vaccines and autism:

National Academies Press (2004). Immunization safety review: Vaccines and autism.

National Academies Press (2011). Adverse effects of vaccines: Evidence and causality.

American Academy of Pedatricians (2013): Vaccine safety studies.

The current AAP webpage on vaccine safety studies.

The Immunization Action Coalition: Examine the evidence.

Taylor et al. (2014). Vaccines are not associated with autism: an evidence-based meta-analysis of case-control and cohort studies. Vaccine Jun 17;32(29):3623-9. Paywalled, but references given here.

I listed all of the studies that were judged usable in at least one of these reviews, removed duplicates, then went through them all and determined, either from the review article or from the study’s abstract, what it concluded. There were 39 studies used, and all 39 failed to find a connection between vaccines and autism. 4 studies were rejected as methodologically unsound by all reviews that considered them; 3 of the 4 found a connection.

(I was, as usual, irked that if a study failed to prove the existence of a link given various assumptions, it was usually cited as having shown that there was no link.)

I understand that even a single study indicating a connection would immediately be seized on by anti-vaccination activists. (I’ve even seen them manage to take a study that indicated no connection, copy a graph in that study that indicated no connection, and write an analysis claiming it proved a connection.) Out there in the real world, maybe it’s good to suppress any such studies. Maybe.

But here on LessWrong, where our job is not physical health, but mental practice, we shouldn’t kid ourselves about what the literature is doing. Our medical research methodologies are not good enough to produce 39 papers and have them all reach the right conclusion. The chances of this happening are only .95 ^ 39 = 0.13, even before taking into account publication and error bias.

Note: This does not apply in the same way to reviews that show a link between X and Y

If the scientific community felt compelled to revisit the question of whether gravity causes objects to fall, and conducted studies using a 95% confidence threshold comparing apples dropped on Earth to apples dropped in deep space, we would not expect 5% of the studies to conclude that gravity has no effect on apples. 95% confidence means that, even if there is no link, there’s a 5% chance the data you get will look as if there is a link. It does not mean that if there is a link, there’s a 5% chance the data will look as if there isn’t. (In fact, if you’re wondering how small studies and large studies can all have 95% confidence, it’s because, by convention, the extra power in large studies is spent on being able to detect smaller and smaller effects, not on higher and higher confidence that a detected effect is real. Being able to detect smaller and smaller effects means having a smaller and smaller chance that, if there is an effect, it will be too small for your study to detect. Having “95% confidence” tells you nothing about the chance that you’re able to detect a link if it exists. It might be 50%. It might be 90%. This is the information black hole that priors disappear into when you use frequentist statistics.)

Critiquing bias

One plausible mechanism is that people look harder for methodological flaws in papers they don’t like than in papers that they like. If we allowed all 43 of the papers, we’d have 3 / 43 finding a link, which would still be surprisingly low, but possible.

To test this, I looked at Magnuson 2007, “Aspartame: A Safety Evaluation Based on Current Use Levels, Regulations, and Toxicological and Epidemiological Studies” (Critical Reviews in Toxicology,37:629–727). This review was the primary—in fact, nearly the only—source cited by the most-recent FDA review panel to review the safety of aspartame. The paper doesn’t mention that its writing was commissioned by companies who sell aspartame. Googling their names revealed that at least 8 of the paper’s 10 authors worked for companies that sell aspartame, either at the time that they wrote it, or shortly afterwards.

I went to section 6.9, “Observations in humans”, and counted the number of words spent discussing possible methodological flaws in papers that indicated a link between aspartame and disease, versus the number of words spent discussing possible methodological flaws in papers that indicated no link. I counted only words suggesting problems with a study, not words describing its methodology.

224 words were spent critiquing 55 studies indicating no link, an average of 4.1 words per study. 1375 words were spent critiquing 24 studies indicating a link, an average of 57.3 words per study.

(432 of those 1375 words were spent on a long digression arguing that formaldehyde isn’t really carcinogenic, so that figure goes down to only 42.9 words per positive-result study if we exclude that. But that’s… so bizarre that I’m not going to exclude it.)