Sorry, I misread your comment originally. You were careful to say that you were talking about 3 different biases, while most people say that there is a right way to orient each question.
But you weren’t careful to say that calibration — the measure of over- and under-confidence — is different from bias. There are four questions here. Introducing new questions that make sense at 50% is irrelevant to the fact that calibration doesn’t make sense at 50%. If we are just doing calibration, some of our tests are wasted. If we add a test of a bias, that part of the calibration test is still wasted. If we force the bin away from 50%, then that improves the calibration test. Moreover, I don’t think that it harms the test of bias.
Ideally, we would look at everything, but is it worth the effort? If we start with one thing, what is most important? I think that overconfidence is the biggest problem and one should start there. In some sense the annotations you suggest are not much more work, but in making the difference between doing and not doing, I think small increments matter.
(While most people are overconfident and calibration exercises are mainly about reducing overconfidence, the problem of 50% is actually a problem of underconfidence.)
Sorry, I misread your comment originally. You were careful to say that you were talking about 3 different biases, while most people say that there is a right way to orient each question.
But you weren’t careful to say that calibration — the measure of over- and under-confidence — is different from bias. There are four questions here. Introducing new questions that make sense at 50% is irrelevant to the fact that calibration doesn’t make sense at 50%. If we are just doing calibration, some of our tests are wasted. If we add a test of a bias, that part of the calibration test is still wasted. If we force the bin away from 50%, then that improves the calibration test. Moreover, I don’t think that it harms the test of bias.
Ideally, we would look at everything, but is it worth the effort? If we start with one thing, what is most important? I think that overconfidence is the biggest problem and one should start there. In some sense the annotations you suggest are not much more work, but in making the difference between doing and not doing, I think small increments matter.
(While most people are overconfident and calibration exercises are mainly about reducing overconfidence, the problem of 50% is actually a problem of underconfidence.)