It rejects far more than implausible/insane/unworthy ideas.
What else does it reject?
Which is why most published results are false, and science is a cesspool.
I think it’s important to look at this on a per-discipline basis. Some disciplines have much higher standards of clarity, precision, and repeatability than others. That article you linked looks at statistical studies with a special focus on medical research, but then seems to make the critical error of generalizing this to all scientific research. Do the findings apply to physics? Math? Computer science?
Do the findings apply to physics? Math? Computer science?
Different fields use different methods. The basic point Ioannidis makes applies to any field which uses null-hypothesis significance-testing statistics for interpreting sampled data.
Math uses formal proofs, so whatever math error rates are (non-zero and meaningful, but not sure how big), they are independent of NHST’s problems.
Experimental physics seems to use a lot of NHST too but they obey the critique by increasing power substantially: reducing measurement error and gathering enormous masses of data, more than is feasible in the other fields, so many n they can use the famed six-sigma alpha, which translates to very high PPV. They’re also helped by the commitment to falsifiable narrow predictions of things like intervals rather than directions (hypothesis testing works much better if you can predict the Higgs’s mass lies within a narrow range rather than having a null hypothesis of mass equals zero and an alternative of mass is non-zero; if you’re interested, see Paul Meehl’s methodological papers on why this is important).
Computer science is tricky:
the mathy parts are math and are safe (but not necessarily important or worth doing),
but other areas like systems work or machine learning may use NHST techniques or may not; there seem to be a lot of replicability problems in optimization work due to variation from machine to machine, and in machine learning I’ve heard many insinuations that papers get published by p-hacking hyperparameters until finally the new algorithm is p<0.05 better than the comparison algorithm or that the new tweak is just overfitting on a standard dataset, and some subfields are visibly rotten (HCI especially; you only have to look at how routine it is for HCI papers to claim an improvement based on NHST techniques applied to n=10 or something to know that those ain’t gonna replicate)… but aside from a few critical papers like “Producing Wrong Data Without Doing Anything Obviously Wrong!” I don’t know of any general argument that most CS research is wrong.
It would be interesting to weight fields by publication count to see if Ioannidis’s title, interpreted literally, is still right. When one criticizes ‘ecology, medicine, biology, psychology, economics’, one is criticizing what must be at least hundreds of thousands of papers every year—those are big fields. I don’t know that math, physics, theoretical CS etc publish enough papers to offset that.
[Peer review] rejects far more than implausible/insane/unworthy ideas.
What else does it reject?
I see papers get rejected all the time for methodological disagreements and failure to cite papers the referee thinks important. More broadly, ideas that are perfectly plausible but contrary to current thinking in a field have a much higher threshold to publication than ideas consonant with current thinking.
But more generally, peer review is normally explicitly aimed at rejecting work judged to be non-novel or non-substantial. That boring replication attempts can’t get published should therefore be seen as a feature not a bug. The ability of academics to publish novel, counter-intuitive and false results should therefore also be seen as a feature not a bug.
I think it’s important to look at this on a per-discipline basis.
Oh, I’m sure some disciplines are worse than others. But as you seem to be tacitly conceding, “the vast majority of scientific output never undergoes real review,” and that’s true in all disciplines.
What else does it reject?
I think it’s important to look at this on a per-discipline basis. Some disciplines have much higher standards of clarity, precision, and repeatability than others. That article you linked looks at statistical studies with a special focus on medical research, but then seems to make the critical error of generalizing this to all scientific research. Do the findings apply to physics? Math? Computer science?
Different fields use different methods. The basic point Ioannidis makes applies to any field which uses null-hypothesis significance-testing statistics for interpreting sampled data.
Math uses formal proofs, so whatever math error rates are (non-zero and meaningful, but not sure how big), they are independent of NHST’s problems.
Ecology, medicine, biology, psychology, economics—heavy NHST users, critique definitely applies.
Experimental physics seems to use a lot of NHST too but they obey the critique by increasing power substantially: reducing measurement error and gathering enormous masses of data, more than is feasible in the other fields, so many n they can use the famed six-sigma alpha, which translates to very high PPV. They’re also helped by the commitment to falsifiable narrow predictions of things like intervals rather than directions (hypothesis testing works much better if you can predict the Higgs’s mass lies within a narrow range rather than having a null hypothesis of mass equals zero and an alternative of mass is non-zero; if you’re interested, see Paul Meehl’s methodological papers on why this is important).
Computer science is tricky:
the mathy parts are math and are safe (but not necessarily important or worth doing),
but other areas like systems work or machine learning may use NHST techniques or may not; there seem to be a lot of replicability problems in optimization work due to variation from machine to machine, and in machine learning I’ve heard many insinuations that papers get published by p-hacking hyperparameters until finally the new algorithm is p<0.05 better than the comparison algorithm or that the new tweak is just overfitting on a standard dataset, and some subfields are visibly rotten (HCI especially; you only have to look at how routine it is for HCI papers to claim an improvement based on NHST techniques applied to n=10 or something to know that those ain’t gonna replicate)… but aside from a few critical papers like “Producing Wrong Data Without Doing Anything Obviously Wrong!” I don’t know of any general argument that most CS research is wrong.
It would be interesting to weight fields by publication count to see if Ioannidis’s title, interpreted literally, is still right. When one criticizes ‘ecology, medicine, biology, psychology, economics’, one is criticizing what must be at least hundreds of thousands of papers every year—those are big fields. I don’t know that math, physics, theoretical CS etc publish enough papers to offset that.
I agree 100%.
I see papers get rejected all the time for methodological disagreements and failure to cite papers the referee thinks important. More broadly, ideas that are perfectly plausible but contrary to current thinking in a field have a much higher threshold to publication than ideas consonant with current thinking.
But more generally, peer review is normally explicitly aimed at rejecting work judged to be non-novel or non-substantial. That boring replication attempts can’t get published should therefore be seen as a feature not a bug. The ability of academics to publish novel, counter-intuitive and false results should therefore also be seen as a feature not a bug.
Oh, I’m sure some disciplines are worse than others. But as you seem to be tacitly conceding, “the vast majority of scientific output never undergoes real review,” and that’s true in all disciplines.