Naively, I would expect it to mean that if you take sufficiently many predictions (i.e. there’s one made every day), and you group them by predicted chance (70%, 80%, etc. at e.g. 10% granularity), then in each bin, the proportion of correct predictions should match the bin’s assigned chance (e.g. between 75% and 85% for the 80% bin). And so given enough predictions, your expected probability for a single prediction coming true should approach the predicted chance. With more predictions, you can make smaller bins (to within 1%, etc).
So, you’re taking the frequentist approach, the probability is the fraction of the times the event happened as n goes to infinity? But tomorrow is unique. It will never repeat again—n is always equal to 1.
And, as mentioned in another reply, calibration and probability are different things.
But tomorrow is unique. It will never repeat again—n is always equal to 1.
The prediction is not unique. I group predictions (with some binning of similar-enough predictions), not days. Then if I’ve seen enough past predictions to be justified that they’re well calibrated, I can use the predicted probability as my subjective probability (or a factor of it).
The trouble with this approach is that it breaks down when we want to describe uncertain events that are unique. The question of who will win the 2016 presidential election is one that we still want to be able to describe with probabilities, even though it doesn’t make great sense to aggregate probabilities across different presidential elections.
In order to explain what a single probability means, instead of what calibration means, you need to describe it as a measure of uncertainty. The three main ‘correctness’ questions then are 1) how well it corresponds to the actual future, 2) how well it corresponds to known clues at the time, and 3) how precisely I’m reporting it.
That’s correct: my approach doesn’t generalize to unique/rare events. The ‘naive’ or frequentist approach seems to work for weather predictions, and creates a simple intuition that’s easier IMO to explain to laymen than more general approaches.
Naively, I would expect it to mean that if you take sufficiently many predictions (i.e. there’s one made every day), and you group them by predicted chance (70%, 80%, etc. at e.g. 10% granularity), then in each bin, the proportion of correct predictions should match the bin’s assigned chance (e.g. between 75% and 85% for the 80% bin). And so given enough predictions, your expected probability for a single prediction coming true should approach the predicted chance. With more predictions, you can make smaller bins (to within 1%, etc).
So, you’re taking the frequentist approach, the probability is the fraction of the times the event happened as n goes to infinity? But tomorrow is unique. It will never repeat again—n is always equal to 1.
And, as mentioned in another reply, calibration and probability are different things.
The prediction is not unique. I group predictions (with some binning of similar-enough predictions), not days. Then if I’ve seen enough past predictions to be justified that they’re well calibrated, I can use the predicted probability as my subjective probability (or a factor of it).
The trouble with this approach is that it breaks down when we want to describe uncertain events that are unique. The question of who will win the 2016 presidential election is one that we still want to be able to describe with probabilities, even though it doesn’t make great sense to aggregate probabilities across different presidential elections.
In order to explain what a single probability means, instead of what calibration means, you need to describe it as a measure of uncertainty. The three main ‘correctness’ questions then are 1) how well it corresponds to the actual future, 2) how well it corresponds to known clues at the time, and 3) how precisely I’m reporting it.
That’s correct: my approach doesn’t generalize to unique/rare events. The ‘naive’ or frequentist approach seems to work for weather predictions, and creates a simple intuition that’s easier IMO to explain to laymen than more general approaches.
What do you mean?
What Vaniver said: my approach breaks down for unique events. Edited for clarity.