The desire to look at calibration rather than prediction-score comes from the fact that calibration at least kind of seems like something you could fairly compare across different prediction sets. Comparing Scott’s 2015 vs. 2014 prediction scores might just reflect which year had more predictable events. In theory it’s also possible that one year’s uncertainties are objectively harder to calibrate, but this seems less likely.
The best procedure is probably to just make a good-faith effort to choose predictions based on interest and predict as though maximizing prediction score. If one wanted to properly align incentives, one might try the following procedure:
1) Announce a set of things to predict, but not the predictions themselves
2) Have another party pledge to reward you (with cash or charity donation, probably) in proportion to your prediction score*, with a multiplier based on how hard they think your prediction topics are.
3) Make your predictions.
There’s a bit of a hurdle in that the domain is negative infinity to zero. One solution would be to set a maximum allowed confidence to make the range finite—for instance, if 99% is the maximum, the worst possible score would be ln(0.01) =~ −4.6,
so a reward of (4.6 + score) would produce the right incentives.
The desire to look at calibration rather than prediction-score comes from the fact that calibration at least kind of seems like something you could fairly compare across different prediction sets. Comparing Scott’s 2015 vs. 2014 prediction scores might just reflect which year had more predictable events. In theory it’s also possible that one year’s uncertainties are objectively harder to calibrate, but this seems less likely.
The best procedure is probably to just make a good-faith effort to choose predictions based on interest and predict as though maximizing prediction score. If one wanted to properly align incentives, one might try the following procedure: 1) Announce a set of things to predict, but not the predictions themselves 2) Have another party pledge to reward you (with cash or charity donation, probably) in proportion to your prediction score*, with a multiplier based on how hard they think your prediction topics are. 3) Make your predictions.
There’s a bit of a hurdle in that the domain is negative infinity to zero. One solution would be to set a maximum allowed confidence to make the range finite—for instance, if 99% is the maximum, the worst possible score would be ln(0.01) =~ −4.6, so a reward of (4.6 + score) would produce the right incentives.