With the Bayes-score being always negative, I don’t see what incentive one would have to submit a mistake report. I think it would be better to test for better than, for example, 90% confidence, by awarding 1 point for a correct report and deducting 9 points for an incorrect report. This achieves the goal of detecting ability to detect bad arguments. Measuring calibration would have to be a seperate test.
Treat not submitting a mistake report as the “I have no idea” claim: that you’ve assigned a probability of “mistakes/total emails” to this particular email being a mistake.
With the Bayes-score being always negative, I don’t see what incentive one would have to submit a mistake report. I think it would be better to test for better than, for example, 90% confidence, by awarding 1 point for a correct report and deducting 9 points for an incorrect report. This achieves the goal of detecting ability to detect bad arguments. Measuring calibration would have to be a seperate test.
Treat not submitting a mistake report as the “I have no idea” claim: that you’ve assigned a probability of “mistakes/total emails” to this particular email being a mistake.