eggsyntax comments on eggsyntax’s Shortform

eggsyntax 12 Nov 2024 15:30 UTC
6 points
0
It’s not that intuitively obvious how Brier scores vary with confidence and accuracy (for example: how accurate do you need to be for high-confidence answers to be a better choice than low-confidence?), so I made this chart to help visualize it:
Here’s log-loss for comparison (note that log-loss can be infinite, so the color scale is capped at 4.0):
Claude-generated code and interactive versions (with a useful mouseover showing the values at each point for confidence, accuracy, and the Brier (or log-loss) score):
- Brier score
- Log-loss
- cubefox 12 Nov 2024 23:03 UTC
  2 points
  0
  Parent
  Interesting. Question: Why does the prediction confidence start at 0.5? And how is the “actual accuracy” calculated?
  - eggsyntax 13 Nov 2024 2:43 UTC
    4 points
    0
    Parent
    Why does the prediction confidence start at 0.5?
    Just because predicting eg a 10% chance of X can instead be rephrased as predicting a 90% chance of not-X, so everything below 50% is redundant.
    And how is the “actual accuracy” calculated?
    It assumes that you predict every event with the same confidence (namely prediction_confidence) and then that you’re correct on actual_accuracy of those. So for example if you predict 100 questions will resolve true, each with 100% confidence, and then 75 of them actually resolve true, you’ll get a Brier score of 0.25 (ie ³⁄₄ of the way up the right-hand said of the graph).
    Of course typically people predict different events with different confidences—but since overall Brier score is the simple average of the Brier scores on individual events, that part’s reasonably intuitive.