Peer review is not a certification of validity, even in more rigorous venues. Not even close.
I am used to seeing questionable claims forwarded under headlines like “new published study says XYZ”.
That XYZ was peer reviewed is one of the weaker arguments one could make in its favor, so when someone uses that as a selling point, it indicates to me that there aren’t better reasons to believe in XYZ.
(Analogously, when I see an ML paper boast that their new method is “competitive with” the SOTA, I immediately think “That means they tried to beat the SOTA, but found their method was at least a little worse. If it was better, they would’ve said so.”)
Peer review can definitely issue certificates mistakenly, but validity is what it aims to certify.
No it doesn’t. It’s hard to say what the “aims” of peer-review are, but “ensuring validity” is certainly not one of them. As a first approximation, I’d say that peer-review aims to certify that the author is not an obvious crank, and that the argument being made is an interesting one to someone in the field.
Care to bet on the results of a survey of academic computer scientists? If the stakes are high enough, I could try to make it happen.
“As a reviewer, I only recommend for acceptance papers that appear to be both valid and interesting.”
Strongly agree - … - Strongly Disagree
“As a reviewer, I would sooner recommend for acceptance a paper that was valid, but not incredibly interesting, than a paper that was interesting, but the conclusions weren’t fully supported by the analysis.”
Care to bet on the results of a survey of academic computer scientists? If the stakes are high enough, I could try to make it happen.
No, no more than I would bet on a survey of <insert religious group here> whether they think <religious group> is more virtuous than <non-religious group>. Academics may claim that peer review is to check validity but their actions tell a different story. This is especially true in “hard” fields like mathematics where reviewers may even struggle to follow an argument, let alone check its validity. Given that most papers are never read by others, this is really not a big deal though.
But I’ll offer three further arguments for why I don’t think peer review ensures validity.
Argument 1:
a) Humans (including reviewers) make mistakes all the time, but
b) Retractions/corrections in papers are very rare.
Unless academics are better at spotting mistakes immediately when reviewing than everyone else (they are not), we should expect lots of peer-reviewed articles to therefore have mistakes because invalid papers rarely get retracted.
Argument 2:
Computer science papers don’t always include reproducible software, but checking code would absolutely be required to check validity.
Argument 3:
It is customary to submit papers that are rejected by one journal to another journal. This means that articles that fail “peer review” at one journal can obtain “peer review” at a different journal.
Me: Peer review can definitely issue certificates mistakenly, but validity is what it aims to certify.
You: No it doesn’t. They just care about interestingness.
Me: Do you agree reviewers aim to only accept valid papers, and care more about validity than interestingness?
You: Yes, but...
If you can admit that we agree on this basic point, I’m happy to discuss further about how good they are at what they aim to do.
1: If retractions were common, surely you would have said that was evidence peer review didn’t accomplish much! If academics were only equally good at spotting mistakes immediately, they would still spot the most mistakes because they get the first opportunity to. And if they do, others don’t get a “chance” to point out a flaw and have the paper retracted. Even though this argument fails, I agree that journals are too reluctant to publish retractions; pride can sometimes get in the way of good science. But that has no bearing on their concern for validity at the reviewing stage.
2: Some amount of trust is taken for granted in science. The existence of trust in a scientific field does not imply that the participants don’t actually care about the truth. Bounded Distrust.
3: Since some level interestingness is also required for publication, this is consistent with a top venue having a higher bar for interestingness than a lesser venue, even while they same requirement for validity. And this is definitely in fact the main effect at play. But yes, there are also some lesser journals/conferences/workshops where they are worse at checking validity, or they care less about it because they are struggling to publish enough articles to justify their existence, or because they are outright scams. So it is relevant that AAAI publishes AI Magazine, and their brand is behind it. I said “peer reviewed” instead of “peer reviewed at a top venue” because the latter would have rubbed you the wrong way even more, but I’m only claiming that passing peer review is worth a lot at a top venue.
Me: Do you agree reviewers aim to only accept valid papers, and care more about validity than interestingness?
I’ve reviewed papers. I didn’t spend copious amounts of time checking the proofs. Some/most reviewers may claim to only accept “valid papers” (whatever that means), but the way the system is set up peer review serves mainly as a filter to filter out blatantly bad papers. Sure, people try to catch the obviously invalid papers. And sure, many researches really try to find mistakes. But at the end of the day, you can always get your results published somewhere, and once something is published, it is almost never retracted.
If retractions were common, surely you would have said that was evidence peer review didn’t accomplish much!
Sure, let me retract my previous argument and amend it with the additional statement that even when a paper is known to have mistakes by the community, it is almost never retracted.
2: Some amount of trust is taken for granted in science. The existence of trust in a scientific field does not imply that the participants don’t actually care about the truth. Bounded Distrust.
I don’t think that this refutes my argument like you think it does. Reviewers don’t check software because they don’t have the capacity to check software. It is well-known that all non-trivial software contains bugs. Reviewers accept this, because at the end of the day they don’t comprehensively check validity.
because the latter would have rubbed you the wrong way even more
No, I think that peer review at a good journal is worth much more than peer review at a bad journal.
I think our disagreement comes down to the stated intent being to check validity, and me arguing that the actual effect is to offer a filter for poorly written/ not interesting articles. There is obviously some overlap, as nobody will find an obviously invalid article interesting! Depending on the journal, this may come close to checking some kind of validity. I trust an article in Annals of Mathematics to be correct in a way that I don’t trust an article in PNAS to be. We can compare peer-review with the FDA—the stated intent is to offer safe medications to the population. The actual effect is …
Do you think the peer reviewers and the editors thought the argument was valid?
Maybe? I really don’t feel comfortable speculating like that about their thinking. What exactly peer review entails & what exactly reviewers expect varies a lot based on the field and journal/conference.
Peer review is not a certification of validity, even in more rigorous venues. Not even close.
I am used to seeing questionable claims forwarded under headlines like “new published study says XYZ”.
That XYZ was peer reviewed is one of the weaker arguments one could make in its favor, so when someone uses that as a selling point, it indicates to me that there aren’t better reasons to believe in XYZ. (Analogously, when I see an ML paper boast that their new method is “competitive with” the SOTA, I immediately think “That means they tried to beat the SOTA, but found their method was at least a little worse. If it was better, they would’ve said so.”)
Do you think the peer reviewers and the editors thought the argument was valid?
Peer review can definitely issue certificates mistakenly, but validity is what it aims to certify.
No it doesn’t. It’s hard to say what the “aims” of peer-review are, but “ensuring validity” is certainly not one of them. As a first approximation, I’d say that peer-review aims to certify that the author is not an obvious crank, and that the argument being made is an interesting one to someone in the field.
Care to bet on the results of a survey of academic computer scientists? If the stakes are high enough, I could try to make it happen.
“As a reviewer, I only recommend for acceptance papers that appear to be both valid and interesting.”
Strongly agree - … - Strongly Disagree
“As a reviewer, I would sooner recommend for acceptance a paper that was valid, but not incredibly interesting, than a paper that was interesting, but the conclusions weren’t fully supported by the analysis.”
Strongly agree - … - Strongly Disagree
No, no more than I would bet on a survey of <insert religious group here> whether they think <religious group> is more virtuous than <non-religious group>. Academics may claim that peer review is to check validity but their actions tell a different story. This is especially true in “hard” fields like mathematics where reviewers may even struggle to follow an argument, let alone check its validity. Given that most papers are never read by others, this is really not a big deal though.
But I’ll offer three further arguments for why I don’t think peer review ensures validity.
Argument 1: a) Humans (including reviewers) make mistakes all the time, but b) Retractions/corrections in papers are very rare.
Unless academics are better at spotting mistakes immediately when reviewing than everyone else (they are not), we should expect lots of peer-reviewed articles to therefore have mistakes because invalid papers rarely get retracted.
Argument 2: Computer science papers don’t always include reproducible software, but checking code would absolutely be required to check validity.
Argument 3: It is customary to submit papers that are rejected by one journal to another journal. This means that articles that fail “peer review” at one journal can obtain “peer review” at a different journal.
PS: For CS it’s harder to check “validity”, but here’s how papers replicate in other fields: https://fantasticanachronism.com/2021/11/18/how-i-made-10k-predicting-which-papers-will-replicate/
Me: Peer review can definitely issue certificates mistakenly, but validity is what it aims to certify.
You: No it doesn’t. They just care about interestingness.
Me: Do you agree reviewers aim to only accept valid papers, and care more about validity than interestingness?
You: Yes, but...
If you can admit that we agree on this basic point, I’m happy to discuss further about how good they are at what they aim to do.
1: If retractions were common, surely you would have said that was evidence peer review didn’t accomplish much! If academics were only equally good at spotting mistakes immediately, they would still spot the most mistakes because they get the first opportunity to. And if they do, others don’t get a “chance” to point out a flaw and have the paper retracted. Even though this argument fails, I agree that journals are too reluctant to publish retractions; pride can sometimes get in the way of good science. But that has no bearing on their concern for validity at the reviewing stage.
2: Some amount of trust is taken for granted in science. The existence of trust in a scientific field does not imply that the participants don’t actually care about the truth. Bounded Distrust.
3: Since some level interestingness is also required for publication, this is consistent with a top venue having a higher bar for interestingness than a lesser venue, even while they same requirement for validity. And this is definitely in fact the main effect at play. But yes, there are also some lesser journals/conferences/workshops where they are worse at checking validity, or they care less about it because they are struggling to publish enough articles to justify their existence, or because they are outright scams. So it is relevant that AAAI publishes AI Magazine, and their brand is behind it. I said “peer reviewed” instead of “peer reviewed at a top venue” because the latter would have rubbed you the wrong way even more, but I’m only claiming that passing peer review is worth a lot at a top venue.
I’ve reviewed papers. I didn’t spend copious amounts of time checking the proofs. Some/most reviewers may claim to only accept “valid papers” (whatever that means), but the way the system is set up peer review serves mainly as a filter to filter out blatantly bad papers. Sure, people try to catch the obviously invalid papers. And sure, many researches really try to find mistakes. But at the end of the day, you can always get your results published somewhere, and once something is published, it is almost never retracted.
Sure, let me retract my previous argument and amend it with the additional statement that even when a paper is known to have mistakes by the community, it is almost never retracted.
I don’t think that this refutes my argument like you think it does. Reviewers don’t check software because they don’t have the capacity to check software. It is well-known that all non-trivial software contains bugs. Reviewers accept this, because at the end of the day they don’t comprehensively check validity.
No, I think that peer review at a good journal is worth much more than peer review at a bad journal.
I think our disagreement comes down to the stated intent being to check validity, and me arguing that the actual effect is to offer a filter for poorly written/ not interesting articles. There is obviously some overlap, as nobody will find an obviously invalid article interesting! Depending on the journal, this may come close to checking some kind of validity. I trust an article in Annals of Mathematics to be correct in a way that I don’t trust an article in PNAS to be. We can compare peer-review with the FDA—the stated intent is to offer safe medications to the population. The actual effect is …
Maybe? I really don’t feel comfortable speculating like that about their thinking. What exactly peer review entails & what exactly reviewers expect varies a lot based on the field and journal/conference.