(Possibly a bit of a tangent) It occurred to me while reading this that perhaps average log odds could make sense in the context in which there is a uniform prior, and the probabilities provided by experts differ because the experts disagree on how to interpret evidence that brings them away from the uniform prior. This has some intuitive appeal:
1) Perhaps, when picking questions to ask forecasters, people have a tendency to pick questions for which they believe the probability that the answer is yes is approximately 50%, because that offers the most opportunity to update in response to the beliefs of the forecasters. If average log odds is an appropriate pooling method to use if you have a uniform prior, then this would explain its good empirical performance. I think I mentioned in our discussion on your EA forum post that if there is a tendency for more knowledgeable forecasters to give more extreme probabilities, then this would explain good performance by average log odds, which weights extreme predictions heavily. A tendency for the questions asked to have priors of near 50% according to the typical unknowledgeable person would explain why more knowledgeable forecasters would assign more extreme probabilities on average: it takes more expertise to justifiably bring their probabilities further from 50%.
2) It excuses the incoherent behavior of average log odds on my ABC example as well. If A, B, and C are mutually exclusive, then they can’t all have 50% prior probability, so a pooling method that implicitly assumes that they do will not give coherent results.
Ultimately, though, I don’t think this is actually true. Consider the example of forecasting a continuous variable x by soliciting probability density functions p1(x) and p2(x) from two experts, and pooling them to get the pdf proportional to √p1(x)p2(x) (renormalized so it integrates to 1). You could also consider forecasting the variable y=f(x) for some differentiable, strictly increasing function f. Then your experts give you pdfs q1(y) and q2(y) satisfying pi(x)=f′(x)qi(f(x)), and you pool them to get the pdf proportional to √q1(y)q2(y). I claim that, if what we’re doing implicitly depends on a uniform prior in a sneaky way, that the first thing should be the appropriate thing to do if x has a uniform prior, and the second thing should be appropriate if y has a uniform prior. If f is nonlinear, then a uniform prior on x induces a non-uniform prior on y, and vice-versa, so we should get incompatible results from each way of doing this, as we were implicitly using different priors each time. But let’s try it: √p1(x)p2(x)=√f′(x)q1(f(x))f′(x)q2(f(x))=f′(x)√q1(f(x))q2(f(x)). Thus, given that both experts provided pdfs satisfying the formula pi(x)=f′(x)qi(f(x)) making their probability distributions on x and y compatible with y=f(x), our pooled pdfs also satisfies that formula, and is also compatible with y=f(x). That is, if we pooled using beliefs about x, and then find the implied beliefs about y, we get the same thing as if we directly pooled using beliefs about y. Different implicit priors don’t appear to be ruining anything.
I conclude that the incoherent results in my ABC example cannot be blamed on switching between the uniform prior on {A,B,C} and the uniform prior on {A,¬A}, and, instead, should be blamed entirely on the experts having different beliefs conditional on ¬A, which is taken account in the calculation using A,B,C, but not in the calculation using A,¬A.
average log odds could make sense in the context in which there is a uniform prior
This is something I have heard from other people too, and I still cannot make sense of it. Why would questions where uninformed forecasters produce uniform priors make logodds averaging work better?
A tendency for the questions asked to have priors of near 50% according to the typical unknowledgeable person would explain why more knowledgeable forecasters would assign more extreme probabilities on average: it takes more expertise to justifiably bring their probabilities further from 50%.
I don’t understand your point. Why would forecasters care about what other people would do? They only want to maximize their own score.
If A, B, and C are mutually exclusive, then they can’t all have 50% prior probability, so a pooling method that implicitly assumes that they do will not give coherent results.
This also doesn’t make much sense to me, though it might be because I still don’t understand the point about needing uniform priors for logodd pooling.
Different implicit priors don’t appear to be ruining anything.
Neat!
I conclude that the incoherent results in my ABC example cannot be blamed on switching between the uniform prior on {A,B,C} and the uniform prior on {A,¬A}, and, instead, should be blamed entirely on the experts having different beliefs conditional on ¬A, which is taken account in the calculation using A,B,C, but not in the calculation using A,¬A.
Why would questions where uninformed forecasters produce uniform priors make logodds averaging work better?
Because it produces situations where more extreme probability estimates correlate with more expertise (assuming all forecasters are well-calibrated).
I don’t understand your point. Why would forecasters care about what other people would do? They only want to maximize their own score.
They wouldn’t. But if both would have started with priors around 50% before they acquired any of their expertise, and it’s their expertise that updates them away from 50%, then more expertise is required to get more extreme odds. If the probability is a martingale that starts at 50%, and the time axis is taken to be expertise, then more extreme probabilities will on average be sampled from later in the martingale; i.e. with more expertise.
This also doesn’t make much sense to me, though it might be because I still don’t understand the point about needing uniform priors for logodd pooling.
If logodd pooling implicitly assumes a uniform prior, then logodd pooling on A vs ¬A assumes A has prior probability 1⁄2, and logodd pooling on A vs B vs C assumes A has a prior of 1⁄3, which, if the implicit prior actually was important, could explain the different results.
(Possibly a bit of a tangent) It occurred to me while reading this that perhaps average log odds could make sense in the context in which there is a uniform prior, and the probabilities provided by experts differ because the experts disagree on how to interpret evidence that brings them away from the uniform prior. This has some intuitive appeal:
1) Perhaps, when picking questions to ask forecasters, people have a tendency to pick questions for which they believe the probability that the answer is yes is approximately 50%, because that offers the most opportunity to update in response to the beliefs of the forecasters. If average log odds is an appropriate pooling method to use if you have a uniform prior, then this would explain its good empirical performance. I think I mentioned in our discussion on your EA forum post that if there is a tendency for more knowledgeable forecasters to give more extreme probabilities, then this would explain good performance by average log odds, which weights extreme predictions heavily. A tendency for the questions asked to have priors of near 50% according to the typical unknowledgeable person would explain why more knowledgeable forecasters would assign more extreme probabilities on average: it takes more expertise to justifiably bring their probabilities further from 50%.
2) It excuses the incoherent behavior of average log odds on my ABC example as well. If A, B, and C are mutually exclusive, then they can’t all have 50% prior probability, so a pooling method that implicitly assumes that they do will not give coherent results.
Ultimately, though, I don’t think this is actually true. Consider the example of forecasting a continuous variable x by soliciting probability density functions p1(x) and p2(x) from two experts, and pooling them to get the pdf proportional to √p1(x)p2(x) (renormalized so it integrates to 1). You could also consider forecasting the variable y=f(x) for some differentiable, strictly increasing function f. Then your experts give you pdfs q1(y) and q2(y) satisfying pi(x)=f′(x)qi(f(x)), and you pool them to get the pdf proportional to √q1(y)q2(y). I claim that, if what we’re doing implicitly depends on a uniform prior in a sneaky way, that the first thing should be the appropriate thing to do if x has a uniform prior, and the second thing should be appropriate if y has a uniform prior. If f is nonlinear, then a uniform prior on x induces a non-uniform prior on y, and vice-versa, so we should get incompatible results from each way of doing this, as we were implicitly using different priors each time. But let’s try it: √p1(x)p2(x)=√f′(x)q1(f(x))f′(x)q2(f(x))=f′(x)√q1(f(x))q2(f(x)). Thus, given that both experts provided pdfs satisfying the formula pi(x)=f′(x)qi(f(x)) making their probability distributions on x and y compatible with y=f(x), our pooled pdfs also satisfies that formula, and is also compatible with y=f(x). That is, if we pooled using beliefs about x, and then find the implied beliefs about y, we get the same thing as if we directly pooled using beliefs about y. Different implicit priors don’t appear to be ruining anything.
I conclude that the incoherent results in my ABC example cannot be blamed on switching between the uniform prior on {A,B,C} and the uniform prior on {A,¬A}, and, instead, should be blamed entirely on the experts having different beliefs conditional on ¬A, which is taken account in the calculation using A,B,C, but not in the calculation using A,¬A.
This is something I have heard from other people too, and I still cannot make sense of it. Why would questions where uninformed forecasters produce uniform priors make logodds averaging work better?
I don’t understand your point. Why would forecasters care about what other people would do? They only want to maximize their own score.
This also doesn’t make much sense to me, though it might be because I still don’t understand the point about needing uniform priors for logodd pooling.
Neat!
I agree with this.
Because it produces situations where more extreme probability estimates correlate with more expertise (assuming all forecasters are well-calibrated).
They wouldn’t. But if both would have started with priors around 50% before they acquired any of their expertise, and it’s their expertise that updates them away from 50%, then more expertise is required to get more extreme odds. If the probability is a martingale that starts at 50%, and the time axis is taken to be expertise, then more extreme probabilities will on average be sampled from later in the martingale; i.e. with more expertise.
If logodd pooling implicitly assumes a uniform prior, then logodd pooling on A vs ¬A assumes A has prior probability 1⁄2, and logodd pooling on A vs B vs C assumes A has a prior of 1⁄3, which, if the implicit prior actually was important, could explain the different results.