Vaniver comments on Proper scoring rules don’t guarantee predicting fixed points

Vaniver 21 Dec 2022 20:02 UTC
LW: 3 AF: 2
0
AF
But we show for any strictly proper scoring rule that there is a function $f$ such that a dishonest prediction is optimal.
Agreed for proper scoring rules, but I’d be a little surprised if it’s not possible to make a skill-free scoring rule, and then get a honest prediction result for that. [This runs into other issues—if the scoring rule is skill-free, where does the skill come from?--but I think this can be solved by having oracle-mode and observation-mode, and being able to do honest oracle-mode at all would be nice.]
- Johannes Treutlein 29 Dec 2022 19:55 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I’m not sure I understand what you mean by a skill-free scoring rule. Can you elaborate what you have in mind?
  - Vaniver 29 Dec 2022 20:41 UTC
    LW: 3 AF: 2
    0
    AF Parent
    Sure, points from a scoring rule come both from ‘skill’ (whether or not you’re accurate in your estimates) and ‘calibration’ (whether your estimates line up with the underlying propensity).
    Rather than generating the picture I’m thinking of (sorry, up to something else and so just writing a quick comment), I’ll describe it: watch this animation, and see the implied maximum expected score as a function of p (the forecaster’s true belief). For all of the scoring rules, it’s a convex function with maxima at 0 and 1. (You can get 1 point on average with a linear rule if p=0, and only 0.5 points on average if p=0.5; for a log rule, it’s 0 points and −0.7 points.)
    
    But could you come up with a scoring rule where the maximum expected score as a function of p is flat? If true, there’s no longer an incentive to have extreme probabilities. But that incentive was doing useful work before, and so this seems likely to break something else—it’s probably no longer the case that you’re incentivized to say your true belief—or require something like batch statistics (since I think you might be able to get something like this by scoring not individual predictions but sets of them, sorted by p or by whether they were true or false). [This can be done in some contexts with markets, where your reward depends on how close the market was to the truth before, but I think it probably doesn’t help here because we’re worried about the oracle’s ability to affect the underlying reality, which is also an issue with prediction markets!]
    To be clear, I’m not at all confident this is possible or sensible—it seems likely to me that an adversarial argument goes thru where as oracle I always benefit from knowing which statements are true and which statements are false (even if I then lie about my beliefs to get a good calibration curve or w/e)--but that’s not an argument about the scale of the distortions that are possible.