In contrast, there are no conditions under which average log odds is the correct thing to do
Taking that as a challenge, can we reverse-engineer a situation where this would be the correct thing to do?
We can first sidestep the additivity-of-disjoint-events problem by limiting the discussion to a single binary outcome.
Then we can fulfill the condition almost trivially by saying our input probabilities are produced by the procedure ‘take the true log odds, add gaussian noise, convert to probability’.
Is that plausible? Well, a Bayesian update is an additive shift to the log odds. So if your forecasters each independently make a bunch of random updates (and would otherwise be accurate), that would do it. A simple model is that the forecasters all have the same prior and a good sample of the real evidence, which would make them update to the correct posterior, except that each one also accepts N bits of fake evidence, each of which has a 50⁄50 chance of supporting X or ~X (and the fake evidence is independent between forecasters).
That’s not a good enough toy model to convince me to use average log odds for everything, but it is good enough that I’d accept it if average log odds seemed to work in a particular domain.
That doesn’t work, even in the case where the number of probability estimates you’re trying to aggregate together is one. The geometric mean of a set of one number is just that number, so the claim that average log odds is the appropriate way to handle this situation implies that if you are given one probability estimate from this procedure, the appropriate thing to do is take it literally, but this is not the case. Instead, you should try to adjust out the expected effect of the gaussian noise. The correct way to do this depends on your prior, but for simplicity and to avoid privileging any particular prior, let’s try using the improper prior such that seeing the probability estimate gives you no information on what the gaussian noise term was. Then your posterior distribution over the “true log odds” is the observed log odds estimate plus a gaussian. The expected value of the true log odds is, of course, the observed log odds estimate, but the expected value of the true probability is not the observed probability estimate; taking the expected value does not commute with applying nonlinear functions like converting between log odds and probabilities.
Oof, rookie mistake. I retract the claim that averaging log odds is ‘the correct thing to do’ in this case
Still—unless I’m wrong again—the average log odds would converge to the correct result in the limit of many forecasters, and the average probabilities wouldn’t? Making the post title bad advice in such a case?
Taking that as a challenge, can we reverse-engineer a situation where this would be the correct thing to do?
We can first sidestep the additivity-of-disjoint-events problem by limiting the discussion to a single binary outcome.
Then we can fulfill the condition almost trivially by saying our input probabilities are produced by the procedure ‘take the true log odds, add gaussian noise, convert to probability’.
Is that plausible? Well, a Bayesian update is an additive shift to the log odds. So if your forecasters each independently make a bunch of random updates (and would otherwise be accurate), that would do it. A simple model is that the forecasters all have the same prior and a good sample of the real evidence, which would make them update to the correct posterior, except that each one also accepts N bits of fake evidence, each of which has a 50⁄50 chance of supporting X or ~X (and the fake evidence is independent between forecasters).
That’s not a good enough toy model to convince me to use average log odds for everything, but it is good enough that I’d accept it if average log odds seemed to work in a particular domain.
That doesn’t work, even in the case where the number of probability estimates you’re trying to aggregate together is one. The geometric mean of a set of one number is just that number, so the claim that average log odds is the appropriate way to handle this situation implies that if you are given one probability estimate from this procedure, the appropriate thing to do is take it literally, but this is not the case. Instead, you should try to adjust out the expected effect of the gaussian noise. The correct way to do this depends on your prior, but for simplicity and to avoid privileging any particular prior, let’s try using the improper prior such that seeing the probability estimate gives you no information on what the gaussian noise term was. Then your posterior distribution over the “true log odds” is the observed log odds estimate plus a gaussian. The expected value of the true log odds is, of course, the observed log odds estimate, but the expected value of the true probability is not the observed probability estimate; taking the expected value does not commute with applying nonlinear functions like converting between log odds and probabilities.
Oof, rookie mistake. I retract the claim that averaging log odds is ‘the correct thing to do’ in this case
Still—unless I’m wrong again—the average log odds would converge to the correct result in the limit of many forecasters, and the average probabilities wouldn’t? Making the post title bad advice in such a case?
(Though median forecast would do just fine)