In your ABC example we rely on the background information that
P(A&B)=0
P(A&C)=0
P(B&C)=0
P(A or B or C)=1.
So the background information is that the events are mutually exclusive and exhaustive. But only then do probabilities need to add to one. It’s not a general fact that “probabilities add to 1”. So taking the geometric average does itself not violate any axioms of probability. We “just” need to update the three geometric averages on this background knowledge. Plausibly how this should be done in this case is to normalize them such that they add to one. (In the case of the arithmetic mean, updating on the background information plausibly wouldn’t change anything here, but that’s not the case for other possible background information.)
Of course this leads to the question: How should we perform such updates in general, i.e. for arbitrary background assumptions? I think how this is commonly done is via finding the distribution which maximizes entropy of the old distribution under the background information, or finding the distribution which minimizes the KL-divergence to the old distribution (I think these methods are equivalent). Of course this requires that we have such a distribution in the first place, rather than just three probabilities obtained by some averaging method.
But it is anyway a more general question (than the question of whether the geometric mean of the odds is better or the arithmetic mean of the probabilities): how should we “average” two or more probability distributions (rather than just two probabilities), assuming they come from equally reliable sources?
But going back to your example about extreme probabilities: Of course when we assume that sources have different reliability, then they would need to be weighted somehow differently. But the interesting case here is what the correct averaging method is for the simple case of equally reliable sources. In this simple case the geometric mean seems to make much more sense, since it doesn’t generally discount extreme probabilities. (Extreme probabilities can often be even more reliable than non-extreme ones. E.g. the probability that the Earth is hollow seems extremely low.)
Also here a quick comment:
I will acknowledge that there are conditions under which averaging probabilities is also not a reasonable thing to do. For example, suppose some proposition X has prior probability 50%, and two experts collect independent evidence about X, and both of them update to assigning 25% probability to X. Since both of them acquired 3:1 evidence against X, and these sources of evidence are independent, combined this gives 9:1 evidence against X, and you should update to assigning 10% probability to X. The prior is important here;
If we assume that the prior was indeed important here then this makes sense, but if we assume that the prior was irrelevant (that they would have arrived at 25% even if their prior was e.g. 10% rather than 50%), then this doesn’t make sense. (Maybe they first assumed the probability of drawing a black ball from an urn was 50%, then they each independently created a large sample, and ~25% of the balls came out black. In this case the prior was mostly irrelevant.) We would need a more general description under which circumstances the prior is indeed important in your sense and justifies the multiplicative evidence aggregation you proposed.
Lastly:
I think it helps to think about why there would be a significant gap between the worst odds you’d accept a bet at and the worst odds you’d accept the opposite bet at. One reason is that someone else’s willingness to bet on something is evidence for it being true, so there should be some interval of odds in which their willingness to make the bet implies that you shouldn’t, in each direction. Even if you don’t think the other person has any relevant knowledge that you don’t, it’s not hard to be more likely to accept bets that are more favorable to you, so if the process by which you turn an intuitive sense of probability into a number is noisy, then if you’re forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average. I think the possibility that adversaries can make ambiguity resolve against you disproportionately often is a good explanation for ambiguity aversion in general, since there are many situations, not just bets, where someone might have an opportunity to profit from your loss.
This is a very interesting possible explanation for betting aversion without needing to assume, as usual, risk aversion! Or rather two explanations, one with an adversary with additional information and one without. But in the second case I don’t see how a noisy process for a probability estimate would lead to being “forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average”. Though the first case seems really plausible. It is so simple I would assume you are not the first with this idea, but I have never heard of it before.
We “just” need to update the three geometric averages on this background knowledge. Plausibly how this should be done in this case is to normalize them such that they add to one.
My problem with a forecast aggregation method that relies on renormalizing to meet some coherence constraints is that then the probabilities you get depend on what other questions get asked. It doesn’t make sense for a forecast aggregation method to give probability 32.5% to A if the experts are only asked about A, but have that probability predictably increase if the experts are also asked about B and C. (Before you try thinking of a reason that the experts’ disagreement about B and C is somehow evidence for A, note that no matter what each of the experts believe, if your forecasting method is mean log odds, but renormalized to make probabilities sum to 1 when you ask about all 3 outcomes, then the aggregated probability assigned to A can only go up when you also ask about B and C, never down. So any such defense would violate conservation of expected evidence.)
(In the case of the arithmetic mean, updating on the background information plausibly wouldn’t change anything here, but that’s not the case for other possible background information.)
Any linear constraints (which are the things you get from knowing that certain Boolean combinations of questions are contradictions or tautologies) that are satisfied by each predictor will also be satisfied by their arithmetic mean.
But it is anyway a more general question (than the question of whether the geometric mean of the odds is better or the arithmetic mean of the probabilities): how should we “average” two or more probability distributions (rather than just two probabilities), assuming they come from equally reliable sources?
That’s part of my point. Arithmetic mean of probabilities gives you a way of averaging probability distributions, as well as individual probabilities. Geometric mean of log odds does not.
If we assume that the prior was indeed important here then this makes sense, but if we assume that the prior was irrelevant (that they would have arrived at 25% even if their prior was e.g. 10% rather than 50%), then this doesn’t make sense. (Maybe they first assumed the probability of drawing a black ball from an urn was 50%, then they each independently created a large sample, and ~25% of the balls came out black. In this case the prior was mostly irrelevant.) We would need a more general description under which circumstances the prior is indeed important in your sense and justifies the multiplicative evidence aggregation you proposed.
In this example, the sources of evidence they’re using are not independent; they can expect ahead of time that each of them will observe the same relative frequency of black balls from the urn, even while not knowing in advance what that relative frequency will be. The circumstances under which the multiplicative evidence aggregation method is appropriate are exactly the circumstances in which the evidence actually is independent.
But in the second case I don’t see how a noisy process for a probability estimate would lead to being “forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average”.
They make their bet direction and size functions of the odds you offer them in such a way that they bet more when you offer better odds. If you give the correct odds, then the bet ends up resolving neutrally on average, but if you give incorrect odds, then which direction you are off in correlates with how big a bet they make in such a way that you lose on average either way.
In your ABC example we rely on the background information that
P(A&B)=0
P(A&C)=0
P(B&C)=0
P(A or B or C)=1.
So the background information is that the events are mutually exclusive and exhaustive. But only then do probabilities need to add to one. It’s not a general fact that “probabilities add to 1”. So taking the geometric average does itself not violate any axioms of probability. We “just” need to update the three geometric averages on this background knowledge. Plausibly how this should be done in this case is to normalize them such that they add to one. (In the case of the arithmetic mean, updating on the background information plausibly wouldn’t change anything here, but that’s not the case for other possible background information.)
Of course this leads to the question: How should we perform such updates in general, i.e. for arbitrary background assumptions? I think how this is commonly done is via finding the distribution which maximizes entropy of the old distribution under the background information, or finding the distribution which minimizes the KL-divergence to the old distribution (I think these methods are equivalent). Of course this requires that we have such a distribution in the first place, rather than just three probabilities obtained by some averaging method.
But it is anyway a more general question (than the question of whether the geometric mean of the odds is better or the arithmetic mean of the probabilities): how should we “average” two or more probability distributions (rather than just two probabilities), assuming they come from equally reliable sources?
But going back to your example about extreme probabilities: Of course when we assume that sources have different reliability, then they would need to be weighted somehow differently. But the interesting case here is what the correct averaging method is for the simple case of equally reliable sources. In this simple case the geometric mean seems to make much more sense, since it doesn’t generally discount extreme probabilities. (Extreme probabilities can often be even more reliable than non-extreme ones. E.g. the probability that the Earth is hollow seems extremely low.)
Also here a quick comment:
If we assume that the prior was indeed important here then this makes sense, but if we assume that the prior was irrelevant (that they would have arrived at 25% even if their prior was e.g. 10% rather than 50%), then this doesn’t make sense. (Maybe they first assumed the probability of drawing a black ball from an urn was 50%, then they each independently created a large sample, and ~25% of the balls came out black. In this case the prior was mostly irrelevant.) We would need a more general description under which circumstances the prior is indeed important in your sense and justifies the multiplicative evidence aggregation you proposed.
Lastly:
This is a very interesting possible explanation for betting aversion without needing to assume, as usual, risk aversion! Or rather two explanations, one with an adversary with additional information and one without. But in the second case I don’t see how a noisy process for a probability estimate would lead to being “forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average”. Though the first case seems really plausible. It is so simple I would assume you are not the first with this idea, but I have never heard of it before.
My problem with a forecast aggregation method that relies on renormalizing to meet some coherence constraints is that then the probabilities you get depend on what other questions get asked. It doesn’t make sense for a forecast aggregation method to give probability 32.5% to A if the experts are only asked about A, but have that probability predictably increase if the experts are also asked about B and C. (Before you try thinking of a reason that the experts’ disagreement about B and C is somehow evidence for A, note that no matter what each of the experts believe, if your forecasting method is mean log odds, but renormalized to make probabilities sum to 1 when you ask about all 3 outcomes, then the aggregated probability assigned to A can only go up when you also ask about B and C, never down. So any such defense would violate conservation of expected evidence.)
Any linear constraints (which are the things you get from knowing that certain Boolean combinations of questions are contradictions or tautologies) that are satisfied by each predictor will also be satisfied by their arithmetic mean.
That’s part of my point. Arithmetic mean of probabilities gives you a way of averaging probability distributions, as well as individual probabilities. Geometric mean of log odds does not.
In this example, the sources of evidence they’re using are not independent; they can expect ahead of time that each of them will observe the same relative frequency of black balls from the urn, even while not knowing in advance what that relative frequency will be. The circumstances under which the multiplicative evidence aggregation method is appropriate are exactly the circumstances in which the evidence actually is independent.
They make their bet direction and size functions of the odds you offer them in such a way that they bet more when you offer better odds. If you give the correct odds, then the bet ends up resolving neutrally on average, but if you give incorrect odds, then which direction you are off in correlates with how big a bet they make in such a way that you lose on average either way.