I think a lot of the pushback against the post that I’m seeing in the older comments is generated by the fact that this “confidence > baseline” rule is presented in its final form without first passing through a stage where it looks more symmetrical. By analogy, imagine that in the normal calibration setting, someone just told you that you are required to phrase all your predictions such that the probabilities are >= 50%. “But why,” you’d think; “doesn’t the symmetry of the situation almost guarantee that this has to be wrong—in what sane world can we predict 80% but we can’t predict 20%?” So instead, the way to present the classical version is that you can predict any value between 0 and 100, and then precisely because of the symmetry noticed above, for the purpose of scoring we lump together the 20s and 80s. And that one possible implementation of this is to do the lumping at prediction-time instead of scoring-time by only letting you specify probabilities >= 50%. Similarly in the system from this post, the fundamental thing is that you have to provide your probability and also a direction that it differs from baseline. And then come scoring time, we will lump together “80%, baseline is higher” with “20%, baseline is lower”. Which means one possible implementation is to do the lumping at prediction-time by only allowing you to make “baseline is lower” predictions. (And another implementation, for anyone who finds this lens useful since it’s closer to the classical setting, would be to only allow you to make >=50% predictions but you also freely specify the direction of the baseline.)
I think a lot of the pushback against the post that I’m seeing in the older comments is generated by the fact that this “confidence > baseline” rule is presented in its final form without first passing through a stage where it looks more symmetrical. By analogy, imagine that in the normal calibration setting, someone just told you that you are required to phrase all your predictions such that the probabilities are >= 50%. “But why,” you’d think; “doesn’t the symmetry of the situation almost guarantee that this has to be wrong—in what sane world can we predict 80% but we can’t predict 20%?” So instead, the way to present the classical version is that you can predict any value between 0 and 100, and then precisely because of the symmetry noticed above, for the purpose of scoring we lump together the 20s and 80s. And that one possible implementation of this is to do the lumping at prediction-time instead of scoring-time by only letting you specify probabilities >= 50%. Similarly in the system from this post, the fundamental thing is that you have to provide your probability and also a direction that it differs from baseline. And then come scoring time, we will lump together “80%, baseline is higher” with “20%, baseline is lower”. Which means one possible implementation is to do the lumping at prediction-time by only allowing you to make “baseline is lower” predictions. (And another implementation, for anyone who finds this lens useful since it’s closer to the classical setting, would be to only allow you to make >=50% predictions but you also freely specify the direction of the baseline.)