does anyone have thoughts on how to improve peer review in academic ML? From discussions with my advisor, my sense is that the system used to depend on word of mouth and people caring more about their academic reputation, which works in a fields of 100′s of researchers but breaks down in fields of 1000′s+. Seems like we need some kind of karma system to both rank reviewers and submissions. I’d be very surprised if nobody has proposed such a system, but a quick google search doesn’t yield results.
I think reforming peer review is probably underrated from a safety perspective (for reasons articulated here—basically bad peer review disincentivizes any rigorous analysis of safety research and degrades trust in the safety ecosystem)
I think requiring authors to also review papers is a pretty good way to both (i) ensure there are enough reviewers for any given subdiscipline and (ii) at least somewhat kick-start healthier review culture. My impression is that many academics don’t see reviewing as part of their responsibilities, and forcing it on them might change this.
I feel like improving the way review papers are assigned would also do a lot. My worst reviews submitted were when I wasn’t well-versed or interested in the topic the paper was on.
Yeah this stuff might helps somewhat, but I think the core problem remains unaddressed: ad-hoc reputation systems don’t scale to thousands of researchers.
It feels like something basic like “have reviewers / area chairs rate other reviewers, and post un-anonymized cumulative reviewer ratings” (a kind of h-index for review quality) might go a long way. The double-bind structure is maintained, while providing more incentive (in terms of status, and maybe direct monetary reward) for writing good reviews.
yeah fair—my main point is that you could have a reviewer reputation system without de-anonymizing reviewers on individual papers
(alternatively, de-anonymizing reviews might improve the incentives to write good reviews on the current margin, but would also introduce other bad incentives towards sycophancy etc. which academics seem deontically opposed to)
Interesting. You’re essentially trying to set up an alternative reputation system I guess. But I don’t see what the incentive is for academics to buy into this new reputation system when they already have one (h-index). Also don’t see what the incentive is for giving honest ratings to other reviewers.
Intuition pump: Most marketplace platforms allow buyers and sellers to rate each other. This has direct usefulness to both because it influences who you buy from / sell to. Therefore there is immediate buy-in.
However, reviewing doesn’t work like this because authors and reviewers aren’t exercising much individual agency (nor should they) in determining what papers to review.
From what I understand, reviewing used to be a non-trivial part of an academic’s reputation, but relied on much smaller academic communities (somewhat akin to Dunbar’s number). So in some sense I’m not proposing a new reputation system, but a mechanism for scaling an existing one (but yeah, trying to get academics to care about a new reputation metric does seem like a pretty big lift)
I don’t really follow the market-place analogy—in a more ideal setup, reviewers would be selling a service to the conferences/journals in exchange for reputation (and possibly actual money) Reviewers would then be selected based on their previous reviewing track-record and domain of expertise. I agree that in the current setup this market structure doesn’t really hold, but this is in some sense the core problem.
does anyone have thoughts on how to improve peer review in academic ML? From discussions with my advisor, my sense is that the system used to depend on word of mouth and people caring more about their academic reputation, which works in a fields of 100′s of researchers but breaks down in fields of 1000′s+. Seems like we need some kind of karma system to both rank reviewers and submissions. I’d be very surprised if nobody has proposed such a system, but a quick google search doesn’t yield results.
I think reforming peer review is probably underrated from a safety perspective (for reasons articulated here—basically bad peer review disincentivizes any rigorous analysis of safety research and degrades trust in the safety ecosystem)
I think this professor has relevant interests: https://www.cs.cmu.edu/~nihars/.
I think requiring authors to also review papers is a pretty good way to both (i) ensure there are enough reviewers for any given subdiscipline and (ii) at least somewhat kick-start healthier review culture. My impression is that many academics don’t see reviewing as part of their responsibilities, and forcing it on them might change this.
I feel like improving the way review papers are assigned would also do a lot. My worst reviews submitted were when I wasn’t well-versed or interested in the topic the paper was on.
Yeah this stuff might helps somewhat, but I think the core problem remains unaddressed: ad-hoc reputation systems don’t scale to thousands of researchers.
It feels like something basic like “have reviewers / area chairs rate other reviewers, and post un-anonymized cumulative reviewer ratings” (a kind of h-index for review quality) might go a long way. The double-bind structure is maintained, while providing more incentive (in terms of status, and maybe direct monetary reward) for writing good reviews.
It’s almost always only single-blind: the reviewers usually know who the authors are.
yeah fair—my main point is that you could have a reviewer reputation system without de-anonymizing reviewers on individual papers
(alternatively, de-anonymizing reviews might improve the incentives to write good reviews on the current margin, but would also introduce other bad incentives towards sycophancy etc. which academics seem deontically opposed to)
Interesting. You’re essentially trying to set up an alternative reputation system I guess. But I don’t see what the incentive is for academics to buy into this new reputation system when they already have one (h-index). Also don’t see what the incentive is for giving honest ratings to other reviewers.
Intuition pump: Most marketplace platforms allow buyers and sellers to rate each other. This has direct usefulness to both because it influences who you buy from / sell to. Therefore there is immediate buy-in.
However, reviewing doesn’t work like this because authors and reviewers aren’t exercising much individual agency (nor should they) in determining what papers to review.
From what I understand, reviewing used to be a non-trivial part of an academic’s reputation, but relied on much smaller academic communities (somewhat akin to Dunbar’s number). So in some sense I’m not proposing a new reputation system, but a mechanism for scaling an existing one (but yeah, trying to get academics to care about a new reputation metric does seem like a pretty big lift)
I don’t really follow the market-place analogy—in a more ideal setup, reviewers would be selling a service to the conferences/journals in exchange for reputation (and possibly actual money) Reviewers would then be selected based on their previous reviewing track-record and domain of expertise. I agree that in the current setup this market structure doesn’t really hold, but this is in some sense the core problem.