We attempt to replicate 67 papers published in 13 well-regarded economics journals using author-provided replication files that include both data and code. Some journals in our sample require data and code replication files, and other journals do not require such files. Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files. We successfully replicate the key qualitative
result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable. We conclude with recommendations on improving replication of economics research.
Note that their implicit definition of “replicable” is very narrow—under their procedure, one can fail to be “replicable” simply by failing to reply to an e-mail from the authors asking for code. This is somewhat of a word play, since typically “failure to replicate” means that one is unable to get the same results as the authors while following the same procedure. Based on their discussion at the end of section 3, it appears that (at most) 9 of the 30 “failed replications” are due to actually running the code and getting different results.
Yes, there is a difference between “unable to replicate because we couldn’t even attempt to replicate” (code and/or data are missing) and “unable to replicate because we tried and the results did not match”. Either both or only the second case could be called “failure to replicate”, depends on your preferred definition.
Still, while the second case is clearly “bad science”—it’s either mistakes or fraud—the first case is “not science” because science doesn’t work by trusting the word of the researcher. A well-known example of the first case is cold fusion.
Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say “Usually Not” by Andrew C. Chang and Phillip Li
Note that their implicit definition of “replicable” is very narrow—under their procedure, one can fail to be “replicable” simply by failing to reply to an e-mail from the authors asking for code. This is somewhat of a word play, since typically “failure to replicate” means that one is unable to get the same results as the authors while following the same procedure. Based on their discussion at the end of section 3, it appears that (at most) 9 of the 30 “failed replications” are due to actually running the code and getting different results.
Yes, there is a difference between “unable to replicate because we couldn’t even attempt to replicate” (code and/or data are missing) and “unable to replicate because we tried and the results did not match”. Either both or only the second case could be called “failure to replicate”, depends on your preferred definition.
Still, while the second case is clearly “bad science”—it’s either mistakes or fraud—the first case is “not science” because science doesn’t work by trusting the word of the researcher. A well-known example of the first case is cold fusion.