Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one, 4(5), e5738.
This study reports that ~2% admitted to data fabrication. However, it is of course difficult to get a good estimate by asking people, since this is a case where people have strong incentives to lie. Asking people about suspicions of colleagues may give an overestimate. So I think in general it’s very hard to estimate actual fabrication rates.
One obvious issue is that many instances of fraud that are caught are likely to be cases where the data looked suspicious to others. This requires eyes to be on the data, for someone to notice, for that person to follow through, and most importantly, for the fraud to be sloppy enough that someone noticed it. This means identified cases of fraud are probably from people who are less careful. So we’re seeing the most blatant, obvious, and sloppy fraud. People who are very good at committing fraud are much more likely to go undetected. And that’s scary.
Asking people about suspicions of colleagues may give an overestimate.
It might also be an underestimate. If you ask most people about how many of their colleagues have stolen in the past or ask men about how many of their friends engaged in sexual assault, you get underestimates.
Fanelli is a good, if dated reference for this. Another important point is that there are levels of misconduct in research, ranging from bad authorship practices to outright fabrication of results, with the less severe practices being relatively more common: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269469/
Aside from all that, there’s irreproducibility, which doesn’t arise from any kind of deliberate misconduct, but still pollutes the epistemic commons: https://www.cos.io/rpcb
There are still more issues. Even if the results of a study can be reproduced given the raw data, and even if the findings can be replicated in subsequent studies, that does not ensure that results have identified the effect researchers claim to have found.
This is because studies can rely on invalid measures. If a study claims to measure P, but fails to do so, it may nevertheless pick up on some real pattern other than a successful measurement of P. In these cases, results can replicate and appear legitimate even if they don’t show what they purport to show.
There’s been some survey data on this, e.g.:
Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PloS one, 4(5), e5738.
This study reports that ~2% admitted to data fabrication. However, it is of course difficult to get a good estimate by asking people, since this is a case where people have strong incentives to lie. Asking people about suspicions of colleagues may give an overestimate. So I think in general it’s very hard to estimate actual fabrication rates.
One obvious issue is that many instances of fraud that are caught are likely to be cases where the data looked suspicious to others. This requires eyes to be on the data, for someone to notice, for that person to follow through, and most importantly, for the fraud to be sloppy enough that someone noticed it. This means identified cases of fraud are probably from people who are less careful. So we’re seeing the most blatant, obvious, and sloppy fraud. People who are very good at committing fraud are much more likely to go undetected. And that’s scary.
It might also be an underestimate. If you ask most people about how many of their colleagues have stolen in the past or ask men about how many of their friends engaged in sexual assault, you get underestimates.
Fanelli is a good, if dated reference for this. Another important point is that there are levels of misconduct in research, ranging from bad authorship practices to outright fabrication of results, with the less severe practices being relatively more common: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4269469/
Aside from all that, there’s irreproducibility, which doesn’t arise from any kind of deliberate misconduct, but still pollutes the epistemic commons: https://www.cos.io/rpcb
There are still more issues. Even if the results of a study can be reproduced given the raw data, and even if the findings can be replicated in subsequent studies, that does not ensure that results have identified the effect researchers claim to have found.
This is because studies can rely on invalid measures. If a study claims to measure P, but fails to do so, it may nevertheless pick up on some real pattern other than a successful measurement of P. In these cases, results can replicate and appear legitimate even if they don’t show what they purport to show.