This is one of the loonier[1] ideas to be found on Overcoming Bias (and that’s quite saying something). Exercise for the reader: test this idea that sharing opinions screens off the usefulness of sharing evidence with the following real-world scenario. I have participated in this scenario several times and know what the correct answer is.
Verbal abuse is not a productive response to the results of an abstract model. Extended imaginary scenarios are not a productive response either. Neither explains why the proofs are wrong or inapplicable, or if inapplicable, why they do not serve useful intellectual purposes such as proving some other claim by contradiction or serving as an ideal to aspire to. Please try to do better.
Your real world scenario tells you that sometimes sharing evidence will move judgements in the right direction.
Thinking that Robert Hanson or someone else on Overcoming Bias hasn’t thought of that argument is naive. Robert Hanson might sometimes make arguments that are wrong but he’s not stupid. If you are treating him as if he would be, then you are likely arguing against a strawman.
Apart from that your example also has strange properties like only four different kind of judgements that reviewers are allowed to make. Why would anyone choose four?
Your real world scenario tells you that sometimes sharing evidence will move judgements in the right direction.
It is a lot more than “sometimes”. In my experience (mainly in computing) no journal editor or conference chair will accept a referee’s report that provides nothing but an overall rating of the paper. The rubric for the referees often explicitly states that. Where ratings of the same paper differ substantially among referees, the reasons for those differing judgements are examined.
Apart from that your example also has strange properties like only four different kind of judgements that reviewers are allowed to make. Why would anyone choose four?
The routine varies but that one is typical. A four-point scale (sometimes with a fifth not on the same dimension: “not relevant to this conference”, which trumps the scalar rating). Sometimes they ask for different aspects to be rated separately (originality, significance, presentation, etc.). Plus, of course, the rationale for the verdict, without which the verdict will not be considered and someone else will be found to referee the paper properly.
Anyone is of course welcome to argue that they’re all doing it wrong, or to found a journal where publication is decided by simple voting rounds without discussion. However, Aumann’s theorem is not that argument, it’s not the optimal version of Delphi (according to the paper that gwern quoted), and I’m not aware of any such journal. Maybe Plos ONE? I’m not familiar with their process, but their criteria for inclusion are non-standard.
It is a lot more than “sometimes”. In my experience (mainly in computing) no journal editor or conference chair will accept a referee’s report that provides nothing but than an overall rating of the paper.
That just tells us that the journals believe that the rating isn’t the only thing that matters. But most journals just do things that make sense to them. The don’t draft their policies based on findings of decision science.
But most journals just do things that make sense to them. The don’t draft their policies based on findings of decision science.
Those findings being? Aumann’s theorem doesn’t go the distance. Anyway, I have no knowledge of how they draft their policies, merely some of what those policies are. Do you have some information to share here?
For example that likert scales are nice if you want someone to give you their opinion.
Of course it might sense to actually do run experiments. Big publishers do rule over 1000′s of journals so it should be easy for them to do the necessary research if the wanted to do so.
Verbal abuse is not a productive response to the results of an abstract model. Extended imaginary scenarios are not a productive response either. Neither explains why the proofs are wrong or inapplicable, or if inapplicable, why they do not serve useful intellectual purposes such as proving some other claim by contradiction or serving as an ideal to aspire to. Please try to do better.
As I said, the scenario is not imaginary.
I might have done so, had you not inserted that condescending parting shot.
Your real world scenario tells you that sometimes sharing evidence will move judgements in the right direction.
Thinking that Robert Hanson or someone else on Overcoming Bias hasn’t thought of that argument is naive. Robert Hanson might sometimes make arguments that are wrong but he’s not stupid. If you are treating him as if he would be, then you are likely arguing against a strawman.
Apart from that your example also has strange properties like only four different kind of judgements that reviewers are allowed to make. Why would anyone choose four?
It is a lot more than “sometimes”. In my experience (mainly in computing) no journal editor or conference chair will accept a referee’s report that provides nothing but an overall rating of the paper. The rubric for the referees often explicitly states that. Where ratings of the same paper differ substantially among referees, the reasons for those differing judgements are examined.
The routine varies but that one is typical. A four-point scale (sometimes with a fifth not on the same dimension: “not relevant to this conference”, which trumps the scalar rating). Sometimes they ask for different aspects to be rated separately (originality, significance, presentation, etc.). Plus, of course, the rationale for the verdict, without which the verdict will not be considered and someone else will be found to referee the paper properly.
Anyone is of course welcome to argue that they’re all doing it wrong, or to found a journal where publication is decided by simple voting rounds without discussion. However, Aumann’s theorem is not that argument, it’s not the optimal version of Delphi (according to the paper that gwern quoted), and I’m not aware of any such journal. Maybe Plos ONE? I’m not familiar with their process, but their criteria for inclusion are non-standard.
That just tells us that the journals believe that the rating isn’t the only thing that matters. But most journals just do things that make sense to them. The don’t draft their policies based on findings of decision science.
Those findings being? Aumann’s theorem doesn’t go the distance. Anyway, I have no knowledge of how they draft their policies, merely some of what those policies are. Do you have some information to share here?
For example that likert scales are nice if you want someone to give you their opinion.
Of course it might sense to actually do run experiments. Big publishers do rule over 1000′s of journals so it should be easy for them to do the necessary research if the wanted to do so.
Yes, it is. You still have not addressed what is either wrong with the proofs or why their results are not useful for any purpose.
Wow. So you started it, and now you’re going to use a much milder insult as an excuse not to participate? Please try to do better.
Well, the caravan moves on. That −1 on your comment isn’t mine, btw.
That was excessive, and I now regret having said it.