I’d be interested to see an analysis of how many failures to replicate we should expect if replicators duplicate methodology perfectly, and whether real-world failures to replicate seem to occur in line with that assumption. Wild guess: there are way more failures to replicate then we should expect. If this guess is accurate, that suggests that experimenters tend to introduce undocumented distorting factors into their experiments, and compiled anecdotal evidence is actually more valuable than experimental evidence if you can find a way to sample it randomly.
To provide some intuition for this guess, I remember reading about some guy who was doing experiments on mice and found that random stuff like the lighting in his laboratory were actually the primary explanatory factors for his experimental results. (Maybe someone else can provide a link? I can’t seem to find the guy on Google.) From this he concluded that almost all experiments that had been done on mice previously were useless. But you can imagine a mouse experiment where instead of using 100 mice in a single laboratory, 100 mice in 100 different laboratories are used. This could deal with the random stuff problem pretty well.
Of course, there’s also the problem of interpreting study results accurately… So I don’t think the number of participants is the bottleneck to making inferences in most cases.
And a meta-analysis obviously won’t suffer from the random stuff problem as much.
I’d be interested to see an analysis of how many failures to replicate we should expect if replicators duplicate methodology perfectly, and whether real-world failures to replicate seem to occur in line with that assumption. Wild guess: there are way more failures to replicate then we should expect. If this guess is accurate, that suggests that experimenters tend to introduce undocumented distorting factors into their experiments, and compiled anecdotal evidence is actually more valuable than experimental evidence if you can find a way to sample it randomly.
To provide some intuition for this guess, I remember reading about some guy who was doing experiments on mice and found that random stuff like the lighting in his laboratory were actually the primary explanatory factors for his experimental results. (Maybe someone else can provide a link? I can’t seem to find the guy on Google.) From this he concluded that almost all experiments that had been done on mice previously were useless. But you can imagine a mouse experiment where instead of using 100 mice in a single laboratory, 100 mice in 100 different laboratories are used. This could deal with the random stuff problem pretty well.
Of course, there’s also the problem of interpreting study results accurately… So I don’t think the number of participants is the bottleneck to making inferences in most cases.
And a meta-analysis obviously won’t suffer from the random stuff problem as much.
You’re thinking of the mouse study covered by Lehrer in his decline effect New Yorker article, which was Crabbe et al 1999 “Genetics of mouse behavior: interactions with laboratory environment”.
Thanks!