My post and Twitter thread about the controversy over the 1954 polio vaccine trials generated many replies on Twitter, so here is a followup.
First, I’m very sympathetic to the dilemma that Salk faced. I think it’s a tough problem, and it’s worth thinking about different ways to approach it. I didn’t mean to cast aspersions on Salk.
One way in general to improve this situation is to make sure that all the controls get the treatment immediately after the trial, if it is proved safe and effective. But in this case, that wouldn’t have changed anything. Polio was a seasonal disease, peaking each summer. Getting the vaccine after the trial meant getting it the next season.
Some people have suggested that trials can be ended early if the data clearly shows a conclusion. This is true, although it’s trickier than it appears—if you do it in a naive way, you are prone to reaching false conclusions. The statistics of doing this properly is sophisticated. This is also difficult in the case of a vaccine, where the outcome is binary (you get the disease or you don’t) and you have to wait for a certain period of exposure. This wasn’t like a blood pressure medication, where you can constantly measure a continuous variable.
One idea occurred to me that I haven’t heard anyone suggest: the trial didn’t have to be 50-50. With a large enough group, you could hold back a smaller subset as the control (80-20?). Again, you need statistics here to tell you how this affects the power of your test.
Returning to the issue of: was an RCT needed at all? Again it’s a tough call, but I still think it was, for two reasons: one scientific/epistemological, and one social/political.
Epistemologically, it’s easy to say in hindsight that an observed-control trial would have been conclusive. Here’s the data (copied from Oshinsky’s book):
Placebo-control areas:
• Vaccinated: 200,745 subjects / 33 cases = 1 per 6,083
• Placebo: 201,229 subjects / 115 cases = 1 per 1,750
Observed-control areas:
• Vaccinated: 221,988 subjects / 38 cases = 1 per 5,842
• Observed: 725,173 subjects / 330 cases = 1 per 2,197
But think of how it might have turned out.
One scenario: You run the trial, and it’s not conclusive. Maybe it shows a slight reduction in incidence, but the p-value is high. Then what do you do? Your trial might have been confounded. Run another trial? You’ve just lost a year.
Another scenario: the vaccine is ineffective, but a confounded trial shows effectiveness. Now you are “vaccinating” the entire country with a worthless non-treatment. How many years does it take for the world to figure this out? How many years have you lost then?
Polio came in epidemics each summer, of unpredictable magnitude. No one understood the full epidemiology or could predict the epidemics—when they would start or end, or how many would be stricken. This makes it hard to know whether the vaccine is working or not just from the incidence rate alone. A rise in epidemics could sow needless doubt in the efficacy of the vaccine. Conversely, a dropoff could give false confidence—until the epidemics resurged.
The tests were also looking for safety. What if the vaccine seems effective, but has a side effect no one anticipated? Again, do you have to re-run your trial? Worse, what if the vaccine is actually causing polio (this is possible with a bad vaccine) and this goes undetected?
Remember, these were huge trials. Over a million subjects, nationwide. An enormous effort, difficult to coordinate; tons of data analysis (with only primitive mainframe computers); very expensive. You don’t want to have to do it twice.
And that brings me to the social/political aspect. Polio wasn’t just a scientific question. It was highly emotional and political—within the scientific community, and in the nation at large. Medical researchers were bitterly divided about the best type of vaccine. Salk used a “killed” virus, whose genetic code was virulent, but which had been chemically inactivated. Others favored an “attenuated” virus, which was genetically modified to be harmless to humans. The infighting was stoked by ego and jealousy. Salk was not the only one who wanted to be first to the vaccine.
So, the tests had to be more than scientifically sound. They had to be politically sound. The trials had to be so conclusive that it would silence even jealous critics using motivated, biased reasoning. They had to prove themselves not only to a reasoning mind, but to a committee. A proper RCT was needed for credibility as much as, or more than, for science.
By the way, all this was being funded by the National Foundation for Infantile Paralysis, a private (non-government) charity funded by voluntary contributions from many donors. They relied on good publicity, and above all on the belief that they were making progress. And the trials were front-page news. Botching them would have been a PR disaster. Would the donor base have supported a second trial? Given all the attention, the first trial had to be conclusive.
So that’s why, at the end of the day, I still think Salk was overconfident, and Bell and Francis were right. But again, I sympathize with the issue and I respect the arguments on Salk’s side.
You can see that as just a simple version of an adaptive trial, with one step. I don’t think it in any way resolves the basic problem people have: if it’s immoral to give half the sample the placebo, it’s not exactly clear why giving a fifth the sample the placebo is moral.
This is an important point. One thing I only relatively recently understood about experiment design was something Gelman has mentioned in passing on occasion: an ideal Bayesian experimenter doesn’t randomize!
Why not? Because, given their priors, there is always another allocation rule which still accomplishes the goal of causal inference (the allocation rule makes its decisions independent of all confounders on average, like randomization, so estimates the causal effect) but does so with the same or lower variance, such as using alternating-allocation (so the experimental and control group always have as identical n as possible, while simple randomization one-by-one will usually result in excess n in one group—which is inefficient). These sorts of rules pose no problem and can be included in the Bayesian model of the process.
The problem is that it will then be inefficient for observers with different priors, who will learn much less. Depending on their priors or models, it may be almost entirely uninformative. By using explicit randomization and no longer making allocations which are based on your priors in any way, you sacrifice efficiency, but the results are equally informative for all observers. If you model the whole process and consider the need to persuade outside observers in order to implement the optimal decision, then randomization is clearly necessary.