Just because predicting eg a 10% chance of X can instead be rephrased as predicting a 90% chance of not-X, so everything below 50% is redundant.
And how is the “actual accuracy” calculated?
It assumes that you predict every event with the same confidence (namely prediction_confidence) and then that you’re correct on actual_accuracy of those. So for example if you predict 100 questions will resolve true, each with 100% confidence, and then 75 of them actually resolve true, you’ll get a Brier score of 0.25 (ie 3⁄4 of the way up the right-hand said of the graph).
Of course typically people predict different events with different confidences—but since overall Brier score is the simple average of the Brier scores on individual events, that part’s reasonably intuitive.
Just because predicting eg a 10% chance of X can instead be rephrased as predicting a 90% chance of not-X, so everything below 50% is redundant.
It assumes that you predict every event with the same confidence (namely
prediction_confidence
) and then that you’re correct onactual_accuracy
of those. So for example if you predict 100 questions will resolve true, each with 100% confidence, and then 75 of them actually resolve true, you’ll get a Brier score of 0.25 (ie 3⁄4 of the way up the right-hand said of the graph).Of course typically people predict different events with different confidences—but since overall Brier score is the simple average of the Brier scores on individual events, that part’s reasonably intuitive.