To calculate the Brier score, you used >your< assumption that meteorites have a 1 in a million chance to hit a specfic area. What about events without a natural way to get those assumptions?
Let’s use another example:
Assume that I predict that neither Obama nor Romney will be elected with 95% confidence. If that prediction becomes true, it is amazing and indicates a high predictive power (especially if I make multiple similar predictions and most of them become true).
Assume that I predict that either Obama or Romney will be elected with 95% confidence. If that prediction becomes true, it is not surprising.
Where is the difference? The second event is expected by others. How can we quantify “difference to expectations of others” and include it in the score? Maybe with an additional weight—weight each prediction with the difference from the expectations of others (as mean of the log ratio or something like that).
If the objective is to get better scores than others, then that helps, though it’s not clear to me that it does so in any consistent way (in particular, the strategy to maximize your score and the strategy to get the best score with the highest probability may well be different, and one of them might involve mis-reporting your own degree of belief).
How can we quantify “difference to expectations of others” and include it in the score?
You’re getting this from the “refinement” part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others’ if you have better refinement (e.g. from “inside information”, or from a superior methodology), even if everyone is identically (perfectly) calibrated.
This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.
ETA: what you say about my using an assumption is not correct—I’ve only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name—the math might become more complicated, is all. I’m not making any assumptions as to the probability distribution of the actual events. The math doesn’t, either. It’s quite general.
I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way.
Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.
I’ve only been making the forecast well-specified
I think “impact here (10% confidence), no impact at that place (90% confidence)” is quite specific. It is a binary event.
To calculate the Brier score, you used >your< assumption that meteorites have a 1 in a million chance to hit a specfic area. What about events without a natural way to get those assumptions?
Let’s use another example:
Assume that I predict that neither Obama nor Romney will be elected with 95% confidence. If that prediction becomes true, it is amazing and indicates a high predictive power (especially if I make multiple similar predictions and most of them become true).
Assume that I predict that either Obama or Romney will be elected with 95% confidence. If that prediction becomes true, it is not surprising.
Where is the difference? The second event is expected by others. How can we quantify “difference to expectations of others” and include it in the score? Maybe with an additional weight—weight each prediction with the difference from the expectations of others (as mean of the log ratio or something like that).
If the objective is to get better scores than others, then that helps, though it’s not clear to me that it does so in any consistent way (in particular, the strategy to maximize your score and the strategy to get the best score with the highest probability may well be different, and one of them might involve mis-reporting your own degree of belief).
You’re getting this from the “refinement” part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others’ if you have better refinement (e.g. from “inside information”, or from a superior methodology), even if everyone is identically (perfectly) calibrated.
This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.
ETA: what you say about my using an assumption is not correct—I’ve only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name—the math might become more complicated, is all. I’m not making any assumptions as to the probability distribution of the actual events. The math doesn’t, either. It’s quite general.
I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way. Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.
I think “impact here (10% confidence), no impact at that place (90% confidence)” is quite specific. It is a binary event.