Every so often someone proposes this (and sometimes someone who thinks they are clever actually carries it out) and it’s always a terrible idea. The purpose of peer review is not to uncover fraud. It’s not even to make sure what’s in the paper is correct. The purpose of peer review is just to make sure what’s in the paper is plausible and sane, and worth being presented to a wider audience. The purpose is to weed out obvious low-quality material such as perpetual motion machines or people who are duplicating other’s work as their own. Could you get fraudulent papers accepted in a journal? Of course. A scientist sufficiently knowledgeable of their field could definitely fool almost any arbitrarily rigorous peer review procedure. Does fraud exist in the scientific world? Of course it does. Peer review is just one of the many mechanisms that serve to uncover it. Real review of one’s work begins after peer review is over and the work is examined by the scientific community at large.
The purpose of peer review is not to uncover fraud.
And this is OK if the fraud rate is low, and unacceptable if it’s high.
If a paper shows all its working, a competent reviewer can judge whether the work as reported is good. How will they detect that the report is a fabrication? All the reviewer sees is the story the author is telling. The reviewer may notice inconsistencies, such as repeated use of the same figures, or data with an implausible distribution, but they will generally have no way to compare the story with the actual facts of what happened in the lab.
Detecting and preventing fraud is a good thing, but I don’t think peer review is a place where much of it can happen.
The purpose of peer review is just to make sure what’s in the paper is plausible and sane, and worth being presented to a wider audience. The purpose is to weed out obvious low-quality material such as perpetual motion machines or people who are duplicating other’s work as their own.
Maybe, but this isn’t how actual peer review operates. It rejects far more than implausible/insane/unworthy ideas.
Real review of one’s work begins after peer review is over and the work is examined by the scientific community at large.
I agree with this, if you’ll concede that by your measure, the vast majority of scientific output never undergoes real review. Which is why most published results are false, and science is a cesspool.
It rejects far more than implausible/insane/unworthy ideas.
What else does it reject?
Which is why most published results are false, and science is a cesspool.
I think it’s important to look at this on a per-discipline basis. Some disciplines have much higher standards of clarity, precision, and repeatability than others. That article you linked looks at statistical studies with a special focus on medical research, but then seems to make the critical error of generalizing this to all scientific research. Do the findings apply to physics? Math? Computer science?
Do the findings apply to physics? Math? Computer science?
Different fields use different methods. The basic point Ioannidis makes applies to any field which uses null-hypothesis significance-testing statistics for interpreting sampled data.
Math uses formal proofs, so whatever math error rates are (non-zero and meaningful, but not sure how big), they are independent of NHST’s problems.
Experimental physics seems to use a lot of NHST too but they obey the critique by increasing power substantially: reducing measurement error and gathering enormous masses of data, more than is feasible in the other fields, so many n they can use the famed six-sigma alpha, which translates to very high PPV. They’re also helped by the commitment to falsifiable narrow predictions of things like intervals rather than directions (hypothesis testing works much better if you can predict the Higgs’s mass lies within a narrow range rather than having a null hypothesis of mass equals zero and an alternative of mass is non-zero; if you’re interested, see Paul Meehl’s methodological papers on why this is important).
Computer science is tricky:
the mathy parts are math and are safe (but not necessarily important or worth doing),
but other areas like systems work or machine learning may use NHST techniques or may not; there seem to be a lot of replicability problems in optimization work due to variation from machine to machine, and in machine learning I’ve heard many insinuations that papers get published by p-hacking hyperparameters until finally the new algorithm is p<0.05 better than the comparison algorithm or that the new tweak is just overfitting on a standard dataset, and some subfields are visibly rotten (HCI especially; you only have to look at how routine it is for HCI papers to claim an improvement based on NHST techniques applied to n=10 or something to know that those ain’t gonna replicate)… but aside from a few critical papers like “Producing Wrong Data Without Doing Anything Obviously Wrong!” I don’t know of any general argument that most CS research is wrong.
It would be interesting to weight fields by publication count to see if Ioannidis’s title, interpreted literally, is still right. When one criticizes ‘ecology, medicine, biology, psychology, economics’, one is criticizing what must be at least hundreds of thousands of papers every year—those are big fields. I don’t know that math, physics, theoretical CS etc publish enough papers to offset that.
[Peer review] rejects far more than implausible/insane/unworthy ideas.
What else does it reject?
I see papers get rejected all the time for methodological disagreements and failure to cite papers the referee thinks important. More broadly, ideas that are perfectly plausible but contrary to current thinking in a field have a much higher threshold to publication than ideas consonant with current thinking.
But more generally, peer review is normally explicitly aimed at rejecting work judged to be non-novel or non-substantial. That boring replication attempts can’t get published should therefore be seen as a feature not a bug. The ability of academics to publish novel, counter-intuitive and false results should therefore also be seen as a feature not a bug.
I think it’s important to look at this on a per-discipline basis.
Oh, I’m sure some disciplines are worse than others. But as you seem to be tacitly conceding, “the vast majority of scientific output never undergoes real review,” and that’s true in all disciplines.
Every so often someone proposes this (and sometimes someone who thinks they are clever actually carries it out) and it’s always a terrible idea. The purpose of peer review is not to uncover fraud. It’s not even to make sure what’s in the paper is correct. The purpose of peer review is just to make sure what’s in the paper is plausible and sane, and worth being presented to a wider audience. The purpose is to weed out obvious low-quality material such as perpetual motion machines or people who are duplicating other’s work as their own. Could you get fraudulent papers accepted in a journal? Of course. A scientist sufficiently knowledgeable of their field could definitely fool almost any arbitrarily rigorous peer review procedure. Does fraud exist in the scientific world? Of course it does. Peer review is just one of the many mechanisms that serve to uncover it. Real review of one’s work begins after peer review is over and the work is examined by the scientific community at large.
And this is OK if the fraud rate is low, and unacceptable if it’s high.
I doubt this happens to more than a tiny number of papers, although probably the more important the result the more likely it will get reviewed.
If a paper shows all its working, a competent reviewer can judge whether the work as reported is good. How will they detect that the report is a fabrication? All the reviewer sees is the story the author is telling. The reviewer may notice inconsistencies, such as repeated use of the same figures, or data with an implausible distribution, but they will generally have no way to compare the story with the actual facts of what happened in the lab.
Detecting and preventing fraud is a good thing, but I don’t think peer review is a place where much of it can happen.
At least in math, a paper can actually be verified during peer review.
Easier said than done. Just because you didn’t notice an error in a two hundred page proof doesn’t mean there isn’t one.
Maybe, but this isn’t how actual peer review operates. It rejects far more than implausible/insane/unworthy ideas.
I agree with this, if you’ll concede that by your measure, the vast majority of scientific output never undergoes real review. Which is why most published results are false, and science is a cesspool.
What else does it reject?
I think it’s important to look at this on a per-discipline basis. Some disciplines have much higher standards of clarity, precision, and repeatability than others. That article you linked looks at statistical studies with a special focus on medical research, but then seems to make the critical error of generalizing this to all scientific research. Do the findings apply to physics? Math? Computer science?
Different fields use different methods. The basic point Ioannidis makes applies to any field which uses null-hypothesis significance-testing statistics for interpreting sampled data.
Math uses formal proofs, so whatever math error rates are (non-zero and meaningful, but not sure how big), they are independent of NHST’s problems.
Ecology, medicine, biology, psychology, economics—heavy NHST users, critique definitely applies.
Experimental physics seems to use a lot of NHST too but they obey the critique by increasing power substantially: reducing measurement error and gathering enormous masses of data, more than is feasible in the other fields, so many n they can use the famed six-sigma alpha, which translates to very high PPV. They’re also helped by the commitment to falsifiable narrow predictions of things like intervals rather than directions (hypothesis testing works much better if you can predict the Higgs’s mass lies within a narrow range rather than having a null hypothesis of mass equals zero and an alternative of mass is non-zero; if you’re interested, see Paul Meehl’s methodological papers on why this is important).
Computer science is tricky:
the mathy parts are math and are safe (but not necessarily important or worth doing),
but other areas like systems work or machine learning may use NHST techniques or may not; there seem to be a lot of replicability problems in optimization work due to variation from machine to machine, and in machine learning I’ve heard many insinuations that papers get published by p-hacking hyperparameters until finally the new algorithm is p<0.05 better than the comparison algorithm or that the new tweak is just overfitting on a standard dataset, and some subfields are visibly rotten (HCI especially; you only have to look at how routine it is for HCI papers to claim an improvement based on NHST techniques applied to n=10 or something to know that those ain’t gonna replicate)… but aside from a few critical papers like “Producing Wrong Data Without Doing Anything Obviously Wrong!” I don’t know of any general argument that most CS research is wrong.
It would be interesting to weight fields by publication count to see if Ioannidis’s title, interpreted literally, is still right. When one criticizes ‘ecology, medicine, biology, psychology, economics’, one is criticizing what must be at least hundreds of thousands of papers every year—those are big fields. I don’t know that math, physics, theoretical CS etc publish enough papers to offset that.
I agree 100%.
I see papers get rejected all the time for methodological disagreements and failure to cite papers the referee thinks important. More broadly, ideas that are perfectly plausible but contrary to current thinking in a field have a much higher threshold to publication than ideas consonant with current thinking.
But more generally, peer review is normally explicitly aimed at rejecting work judged to be non-novel or non-substantial. That boring replication attempts can’t get published should therefore be seen as a feature not a bug. The ability of academics to publish novel, counter-intuitive and false results should therefore also be seen as a feature not a bug.
Oh, I’m sure some disciplines are worse than others. But as you seem to be tacitly conceding, “the vast majority of scientific output never undergoes real review,” and that’s true in all disciplines.