Agreed. I think a strong reason why this might work at all is that forecasters are primarily judged by some other strictly proper scoring rule—meaning that they wouldn’t have an incentive to fake calibration if it makes them come out worse in terms of e.g. Brier or log score.
Agreed. I think a strong reason why this might work at all is that forecasters are primarily judged by some other strictly proper scoring rule—meaning that they wouldn’t have an incentive to fake calibration if it makes them come out worse in terms of e.g. Brier or log score.