if someone gives the probability of 20% that B will win, and 80% that A will win, why do they say ‘polls were wrong’ ‘predictions were wrong’ if it turns out that B won?
If that’s a single someone, saying “he was wrong” is not quite correct.
However if a hundred someones gave these probabilities, it would be reasonable to say “forecasts were wrong” (note the plural).
“If many forecasts say the probability is 80% that A will win, 20% that B will win, why do they say the forecasts were wrong if B wins?”
Wrong implies bivalence, binary thinking, duality: it implies right. A probability cannot be binary, it’s infinite. My brain has a hard time understanding why it’s reasonable… Kind of Orwellian.
So to my point. Forecasts were only wrong if they say A will win, but B wins. Is this not correct? Stating 80% in hindsight is equal to stating 0%, and even before that it’s 0% or 100% or it’s void, nothing, of no substance...
You have a certain stable process that generates forecasts. You generate a forecast: 80% for A, 20% for B. B happens. You generate another forecast: 80% for C, 20% for D. D happens. You generate another forecast...
If events that you forecast at 20% keep happening and events you forecast at 80% keep not happening, how many forecasts do you need to recognize that your forecast-generating process is wrong?
That would not be reasonable if we were talking about something like a prediction of whether a 5-sided die would come up with the number 1. Why are polls any different?
Because the polls are supposed to be different and all forecasts about a 5-sided die are the same.
Imagine yourself collecting forecasts and updating on them. With the die, many forecasts will not change your expected probabilities because these forecasts are basically all the same. When you hear another one, the amount of information you have doesn’t change. That is not (supposed to be) the case with polls.
If one forecast says 80% vs 20% and another, different forecast using, say, a different methodology or different sources, also says 80% vs 20%, your expected probabilities should be >80% vs <20%, how much more and less depends on how much do you believe the forecasts are correlated.
If you hear many different forecasts saying 80:20, you expectation should not be 80:20.
Are you saying that if many forecasters predict that something has an 80% probability of happening and they all use different methodologies, I should expect it to happen with greater than 80% probability? Why?
If they were independent, then it would be trivial to update on each of them and arrive at a meta-forecast much greater than 80%. But they’re really not. Many of them are based on the same polls, news, and historical behaviors. They may have different models, but they’re very much not independent forecasts.
If they were independent … But they’re really not.
I agree. That’s why calculating the “combined” forecast is hard—you need to estimate the degree of co-dependency. But as long as the forecasts are not exactly the same, each new one gets you a (metaphorical) bit of information and your posterior probability should creep up from 80%.
Basically it depends on the source of uncertainty. If all the uncertainty is in the random variable being modeled (as it is in the die example), adding more forecasts (or models) changes nothing—you still have the same uncertainty. However if part of the uncertainty is in the model itself—there is some model error—then you can reduce this model error by combining different (ideally, independent) models.
Imaging a forecast which says: I think A will win, but I’m uncertain so I will say 80% to A and 20% to B. And there is another, different forecast which says the same thing. If you combine the two, your probability of A should be higher than 80%.
If that’s a single someone, saying “he was wrong” is not quite correct.
However if a hundred someones gave these probabilities, it would be reasonable to say “forecasts were wrong” (note the plural).
Yes, you are right on the point. I wanted to ask:
“If many forecasts say the probability is 80% that A will win, 20% that B will win, why do they say the forecasts were wrong if B wins?”
Wrong implies bivalence, binary thinking, duality: it implies right. A probability cannot be binary, it’s infinite. My brain has a hard time understanding why it’s reasonable… Kind of Orwellian.
So to my point. Forecasts were only wrong if they say A will win, but B wins. Is this not correct? Stating 80% in hindsight is equal to stating 0%, and even before that it’s 0% or 100% or it’s void, nothing, of no substance...
Well, think about it this way.
You have a certain stable process that generates forecasts. You generate a forecast: 80% for A, 20% for B. B happens. You generate another forecast: 80% for C, 20% for D. D happens. You generate another forecast...
If events that you forecast at 20% keep happening and events you forecast at 80% keep not happening, how many forecasts do you need to recognize that your forecast-generating process is wrong?
That would not be reasonable if we were talking about something like a prediction of whether a 5-sided die would come up with the number 1. Why are polls any different?
Because the polls are supposed to be different and all forecasts about a 5-sided die are the same.
Imagine yourself collecting forecasts and updating on them. With the die, many forecasts will not change your expected probabilities because these forecasts are basically all the same. When you hear another one, the amount of information you have doesn’t change. That is not (supposed to be) the case with polls.
If one forecast says 80% vs 20% and another, different forecast using, say, a different methodology or different sources, also says 80% vs 20%, your expected probabilities should be >80% vs <20%, how much more and less depends on how much do you believe the forecasts are correlated.
If you hear many different forecasts saying 80:20, you expectation should not be 80:20.
I still don’t see the difference.
Are you saying that if many forecasters predict that something has an 80% probability of happening and they all use different methodologies, I should expect it to happen with greater than 80% probability? Why?
Use the simple Bayesian updating on the evidence. A new, different forecast is a new piece of evidence.
If they were independent, then it would be trivial to update on each of them and arrive at a meta-forecast much greater than 80%. But they’re really not. Many of them are based on the same polls, news, and historical behaviors. They may have different models, but they’re very much not independent forecasts.
I agree. That’s why calculating the “combined” forecast is hard—you need to estimate the degree of co-dependency. But as long as the forecasts are not exactly the same, each new one gets you a (metaphorical) bit of information and your posterior probability should creep up from 80%.
But why is it a piece of evidence pointing to greater than 80% instead of 80%?
Basically it depends on the source of uncertainty. If all the uncertainty is in the random variable being modeled (as it is in the die example), adding more forecasts (or models) changes nothing—you still have the same uncertainty. However if part of the uncertainty is in the model itself—there is some model error—then you can reduce this model error by combining different (ideally, independent) models.
Imaging a forecast which says: I think A will win, but I’m uncertain so I will say 80% to A and 20% to B. And there is another, different forecast which says the same thing. If you combine the two, your probability of A should be higher than 80%.