[A]lmost no innovative programs work, in the sense of reliably demonstrating benefits in excess of costs in replicated RCTs [randomized controlled trials]. Only about 10 percent of new social programs in fields like education, criminology and social welfare demonstrate statistically significant benefits in RCTs. When presented with an intelligent-sounding program endorsed by experts in the topic, our rational Bayesian prior ought to be “It is very likely that this program would fail to demonstrate improvement versus current practice if I tested it.”
In other words, discovering program improvements that really work is extremely hard. We labor in the dark—scratching and clawing for tiny scraps of causal insight.
10% isn’t that bad as long as you continue the programs that were found to succeed and stop the programs that were found to fail. Come up with 10 intelligent-sounding ideas, obtain expert endorsements, do 10 randomized controlled trials, get 1 significant improvement. Then repeat.
Only about 10 percent of new social programs in fields like education, criminology and social welfare demonstrate statistically significant benefits in RCTs
This is a higher rate than I’d expected. It implies that current policies in these three fields are not really thoroughly thought out, or at least not to the extent that I had expected. It seems that there is substantial room for improvement.
Remember that programs will not even be tested unless there are good reasons to expect improvement over current protocol. Most programs that are explicitly considered are worse than those that are tested, and most possible programs are worse than those that are explicitly considered. Therefore we can expect that far, far fewer than ten percent of possible programs would yield significant improvements.
That is true. However, there is a second filtering process, after filtering by experts; and that is what I will refer to as filtering by experiment (i.e. we’ll try this, and if it works we keep doing it, and if it doesn’t we don’t). Evolution is basically a mix of random mutation and filtering by experiment, and it shows that, given enough time, such a filter can be astonishingly effective. (That time can be drastically reduced by adding another filter—such as filtering-by-experts—before the filtering-by-experiment step)
The one-to-two percent expectation that I had was a subconscious expectation of the comparison of the effectiveness of the filtering-by-experts in comparison to the filtering-by-experiment over time. Investigating my reasoning more thoroughly, I think that what I had failed to appreciate is probably that there really hasn’t been enough time for filtering-by-experiment to have as drastic an effect as I’d assumed; societies change enough over time that what was a good idea a thousand years ago is probably not going to be a good idea now. (Added to this, it likely takes more than a month to see whether such a social program actually is effective or not; so there hasn’t really been time for all that many consecutive experiments, and there hasn’t really been a properly designed worldwide experimental test model, either).
It implies that current policies in these three fields are not really thoroughly thought out, or at least not to the extent that I had expected.
That’s one possible explanation.
Another possible explanation is that there is a variety of powerful stakeholders in these fields and the new social programs are actually designed to benefit them and not whoever the programs claim to help.
I think the quote is from Jim Manzi rather than Megan McArdle, given that McArdle starts the article with
I asked Jim Manzi, who has literally written the book on randomized controlled trials, to share his thoughts. Below is what he said:
and later on in the article it says
I agree with the weight and seriousness of each of these objections. My agreement is not ad hoc; I wrote a book that tried to describe how businesses have implemented experimental processes that operate in the face of all of these issues.
suggesting that the whole article after the first paragraph is a quote (or possibly paraphrase).
Megan McArdle quoting or paraphrasing Jim Manzi.
[Edited in response to Kaj’s comment.]
10% isn’t that bad as long as you continue the programs that were found to succeed and stop the programs that were found to fail. Come up with 10 intelligent-sounding ideas, obtain expert endorsements, do 10 randomized controlled trials, get 1 significant improvement. Then repeat.
Unfortunately we don’t really have the political system to do this.
But I have this great idea that will change that!
...Oh.
Unfortunately, governments are really bad at doing this.
Humans in general are very bad at this. The only reason capitalism works is that the losing experiments run out of money.
That’s a very powerful reason.
True, but that doesn’t mean we’re laboring in the dark. It just means we’ve got our eyes closed.
Unfortunately, the people involved have an incentive to keep them closed.
I don’t think that’s really relevant to the original quote.
It depends on how many completely ineffectual programs would demonstrate improvement versus current practices.
Some relevant links:
“The Iron Law Of Evaluation And Other Metallic Rules”, Rossi 1987
“The Efficacy of Psychological, Educational, and Behavioral Treatment: Confirmation From Meta-Analysis”, Lipsey & Wilson 1993
“Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects?”, Coalition for Evidence-Based Policy 2013 (excerpts)
“One Hundred Years of Social Psychology Quantitatively Described”, Bond et al 2003
This is a higher rate than I’d expected. It implies that current policies in these three fields are not really thoroughly thought out, or at least not to the extent that I had expected. It seems that there is substantial room for improvement.
I would have expected perhaps one or two percent.
Remember that programs will not even be tested unless there are good reasons to expect improvement over current protocol. Most programs that are explicitly considered are worse than those that are tested, and most possible programs are worse than those that are explicitly considered. Therefore we can expect that far, far fewer than ten percent of possible programs would yield significant improvements.
That is true. However, there is a second filtering process, after filtering by experts; and that is what I will refer to as filtering by experiment (i.e. we’ll try this, and if it works we keep doing it, and if it doesn’t we don’t). Evolution is basically a mix of random mutation and filtering by experiment, and it shows that, given enough time, such a filter can be astonishingly effective. (That time can be drastically reduced by adding another filter—such as filtering-by-experts—before the filtering-by-experiment step)
The one-to-two percent expectation that I had was a subconscious expectation of the comparison of the effectiveness of the filtering-by-experts in comparison to the filtering-by-experiment over time. Investigating my reasoning more thoroughly, I think that what I had failed to appreciate is probably that there really hasn’t been enough time for filtering-by-experiment to have as drastic an effect as I’d assumed; societies change enough over time that what was a good idea a thousand years ago is probably not going to be a good idea now. (Added to this, it likely takes more than a month to see whether such a social program actually is effective or not; so there hasn’t really been time for all that many consecutive experiments, and there hasn’t really been a properly designed worldwide experimental test model, either).
That’s one possible explanation.
Another possible explanation is that there is a variety of powerful stakeholders in these fields and the new social programs are actually designed to benefit them and not whoever the programs claim to help.
Remember, you expect 5% to give a statistically significant result just by chance...
That’s only true of the programs which can be expected to produce no detriments, surely?
I think the quote is from Jim Manzi rather than Megan McArdle, given that McArdle starts the article with
and later on in the article it says
suggesting that the whole article after the first paragraph is a quote (or possibly paraphrase).