Let’s say you want to assign a probability to some proposition X. Maybe you think about what odds you’d accept bets at, and decide you’d bet on X at 1:99 odds against X, and you’d bet against X at 1:9 odds against X. This implies you think the probability of X is somewhere between 1% and 10%. If you wouldn’t accept bets in either direction at intermediate odds, how should you refine this interval to a point estimate for the probability of X? Or maybe you asked two experts, and one of them told you that X has a 10% probability of being true, and another told you that X has a 1% probability of being true. If you’re inclined to just trust the experts as you don’t know anything about the subject yourself, and you don’t know which expert to trust, how should you combine these into a point estimate for the probability of X?
One popular answer I’ve seen is to take the geometric mean of the odds ratios (or averaging the log odds). So in either of the above scenarios, the geometric mean of 1:9 and 1:99 is 1:√9∗99≈1:30, so you would assign a probability of about 3.2% to X. I think this is a bad answer, and that a better answer would be to average the probabilities (so, in these cases, you’d average 1% and 10% to get a probability of 5.5% for X). Here are many reasons for this:
Probabilities must add to 1. The average log odds rule doesn’t do this. Let’s try an example. Let’s suppose you’ve got some event A, and you ask three experts what the probability of A is. Expert 1 tells you that A has probability 50%, while experts 2 and 3 both say that A has probability 25%. The geometric mean of 1:1, 1:3, and 1:3, is about 1:2.1, so we get an overall probability of 32.5%, just less than 1⁄3. But now consider two more events, B and C, such that exactly one of A, B, and C must be true. It turns out that expert 1 gives you a probability distribution 50% A, 25% B, 25% C, expert 2 gives you a probability distribution 25% A, 50% B, 25% C, and expert 3 gives you a probability distribution 25% A, 25% B, 50% C. The average log odds rule assigns a 32.5% probability to each of A, B, and C, even though you know one of them must occur. Or, put differently, the average log odds rule assigns probability 32.5% to A, 32.5% to B, and 67.5% to “A or B”, violating additivity of probabilities of disjoint events. Averaging probabilities assigns probability 1⁄3 to each of A, B, and C, as any rule for combining probability estimates which treats the experts interchangeably and treats the events interchangeably must.
There’s a clear model for why averaging probabilities is a reasonable thing to do under some assumptions: Let’s say you have various models that you can use for assigning probabilities to things, and you believe that one of these models is roughly correct, but you don’t know which one. Maybe you’re asking some experts for probabilities, and one of them gives you well-calibrated probabilities that take into account all available information, and the others’ probabilities don’t provide any further useful information, but you have no idea which expert is the good one, even after seeing the probabilities they give you. The appropriate thing to do here is average together the probabilities outputted by your models or experts. In contrast, there are no conditions under which average log odds is the correct thing to, because violating additivity of disjoint events is never the correct thing to do (see previous paragraph).
I will acknowledge that there are conditions under which averaging probabilities is also not a reasonable thing to do. For example, suppose some proposition X has prior probability 50%, and two experts collect independent evidence about X, and both of them update to assigning 25% probability to X. Since both of them acquired 3:1 evidence against X, and these sources of evidence are independent, combined this gives 9:1 evidence against X, and you should update to assigning 10% probability to X. The prior is important here; if the prior were 10% and both experts updated to 25%, then you should update to 50%. Of course, if you were tempted to use average log odds to combine probability estimates, you’re probably not in the kind of situation in which this makes sense, and combining two probability estimates into something that isn’t between the two probably isn’t intended behavior. If you think carefully about where the probabilities you want to combine together in some situation are coming from, and how they relate to each other, then you might be able to do something better than averaging the probabilities. But I maintain that, if you want a quick and dirty heuristic, averaging probabilities is a better quick and dirty heuristic than anything as senseless as averaging log odds.
Probably part of the intuition motivating something more like average log odds rather than average probabilities is that averaging probabilities seems to ignore extreme probabilities. If you average 10% and 0.0000000001%, you get 5%, same as if you average 10% and 0.1%. But 0.1% and 0.0000000001% are really different, so maybe they shouldn’t have almost exactly the same effect on the end result? If the source of that 0.0000000001% figure is considered trustworthy, then they wouldn’t have assigned such an extreme probability without a good reason, and 5% is an enormous update away from that. But first of all, it’s not necessarily true that the more extreme probabilities must have come from stronger evidence if the probabilities are being arrived at rationally; that depends on the prior. For example, suppose two experts are asked to provide a probability distribution over bitstrings of length 20 that will be generated from the next 20 flips of a certain coin. Expert 1 assigns probability 2^-20 to each bitstring. Expert 2 assigns probability 10% to the particular bitstring 00100010101111010000, and distributes probability evenly among the remaining bitstrings. In this case it is expert 2 who’s claiming to have some very interesting information about how this coin works, which they wouldn’t have claimed without good reason, even though they are assigning 10% probability to an event that expert 1 is assigning probability 2^-20 to. Second, what if the probabilities aren’t arrived at rationally? Probabilities are between 0 and 1, while log odds are between −∞ and +∞, so when averaging a large number of probabilities together, no unreliable source can move the average too much, but when averaging a large number of log odds, an unreliable source can have arbitrarily large effects on the result. And third, probabilities, not log odds, are the correct scale to use for decision-making. If expert 1 says some event has probability 1%, or 3%, and expert 2 says the same event has probability 0.01% or 0.0000001%, then, if the event in question is important enough for you to care about these differences, the possibility that expert 1 has accounted for the reasons someone might give a very low probability and has good reason to give a much higher probability instead should be much more interesting to you than the hypothesis that expert 2 has good reason to give such a low probability, and the relatively large differences between the “1%” or “3%” that expert 1 might have told you shouldn’t be largely ignored and washed out in log odds averaging with expert 2.
Let’s go back to the example where you’re trying to get a probability out of the odds you’d be willing to bet at. I think it helps to think about why there would be a significant gap between the worst odds you’d accept a bet at and the worst odds you’d accept the opposite bet at. One reason is that someone else’s willingness to bet on something is evidence for it being true, so there should be some interval of odds in which their willingness to make the bet implies that you shouldn’t, in each direction. Even if you don’t think the other person has any relevant knowledge that you don’t, it’s not hard to be more likely to accept bets that are more favorable to you, so if the process by which you turn an intuitive sense of probability into a number is noisy, then if you’re forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average. I think the possibility that adversaries can make ambiguity resolve against you disproportionately often is a good explanation for ambiguity aversion in general, since there are many situations, not just bets, where someone might have an opportunity to profit from your loss. Anyway, if the worst odds you’d be willing to bet on are bounds on how seriously you take the hypothesis that someone else knows something that should make you update a particular amount, and you want to get an actual probability, then you should average over probabilities you perhaps should end up at, weighted by how likely it is that you should end up at them. This is an arithmetic mean of probabilities, not a geometric mean of odds.
Average probabilities, not log odds
Let’s say you want to assign a probability to some proposition X. Maybe you think about what odds you’d accept bets at, and decide you’d bet on X at 1:99 odds against X, and you’d bet against X at 1:9 odds against X. This implies you think the probability of X is somewhere between 1% and 10%. If you wouldn’t accept bets in either direction at intermediate odds, how should you refine this interval to a point estimate for the probability of X? Or maybe you asked two experts, and one of them told you that X has a 10% probability of being true, and another told you that X has a 1% probability of being true. If you’re inclined to just trust the experts as you don’t know anything about the subject yourself, and you don’t know which expert to trust, how should you combine these into a point estimate for the probability of X?
One popular answer I’ve seen is to take the geometric mean of the odds ratios (or averaging the log odds). So in either of the above scenarios, the geometric mean of 1:9 and 1:99 is 1:√9∗99≈1:30, so you would assign a probability of about 3.2% to X. I think this is a bad answer, and that a better answer would be to average the probabilities (so, in these cases, you’d average 1% and 10% to get a probability of 5.5% for X). Here are many reasons for this:
Probabilities must add to 1. The average log odds rule doesn’t do this. Let’s try an example. Let’s suppose you’ve got some event A, and you ask three experts what the probability of A is. Expert 1 tells you that A has probability 50%, while experts 2 and 3 both say that A has probability 25%. The geometric mean of 1:1, 1:3, and 1:3, is about 1:2.1, so we get an overall probability of 32.5%, just less than 1⁄3. But now consider two more events, B and C, such that exactly one of A, B, and C must be true. It turns out that expert 1 gives you a probability distribution 50% A, 25% B, 25% C, expert 2 gives you a probability distribution 25% A, 50% B, 25% C, and expert 3 gives you a probability distribution 25% A, 25% B, 50% C. The average log odds rule assigns a 32.5% probability to each of A, B, and C, even though you know one of them must occur. Or, put differently, the average log odds rule assigns probability 32.5% to A, 32.5% to B, and 67.5% to “A or B”, violating additivity of probabilities of disjoint events. Averaging probabilities assigns probability 1⁄3 to each of A, B, and C, as any rule for combining probability estimates which treats the experts interchangeably and treats the events interchangeably must.
There’s a clear model for why averaging probabilities is a reasonable thing to do under some assumptions: Let’s say you have various models that you can use for assigning probabilities to things, and you believe that one of these models is roughly correct, but you don’t know which one. Maybe you’re asking some experts for probabilities, and one of them gives you well-calibrated probabilities that take into account all available information, and the others’ probabilities don’t provide any further useful information, but you have no idea which expert is the good one, even after seeing the probabilities they give you. The appropriate thing to do here is average together the probabilities outputted by your models or experts. In contrast, there are no conditions under which average log odds is the correct thing to, because violating additivity of disjoint events is never the correct thing to do (see previous paragraph).
I will acknowledge that there are conditions under which averaging probabilities is also not a reasonable thing to do. For example, suppose some proposition X has prior probability 50%, and two experts collect independent evidence about X, and both of them update to assigning 25% probability to X. Since both of them acquired 3:1 evidence against X, and these sources of evidence are independent, combined this gives 9:1 evidence against X, and you should update to assigning 10% probability to X. The prior is important here; if the prior were 10% and both experts updated to 25%, then you should update to 50%. Of course, if you were tempted to use average log odds to combine probability estimates, you’re probably not in the kind of situation in which this makes sense, and combining two probability estimates into something that isn’t between the two probably isn’t intended behavior. If you think carefully about where the probabilities you want to combine together in some situation are coming from, and how they relate to each other, then you might be able to do something better than averaging the probabilities. But I maintain that, if you want a quick and dirty heuristic, averaging probabilities is a better quick and dirty heuristic than anything as senseless as averaging log odds.
Probably part of the intuition motivating something more like average log odds rather than average probabilities is that averaging probabilities seems to ignore extreme probabilities. If you average 10% and 0.0000000001%, you get 5%, same as if you average 10% and 0.1%. But 0.1% and 0.0000000001% are really different, so maybe they shouldn’t have almost exactly the same effect on the end result? If the source of that 0.0000000001% figure is considered trustworthy, then they wouldn’t have assigned such an extreme probability without a good reason, and 5% is an enormous update away from that. But first of all, it’s not necessarily true that the more extreme probabilities must have come from stronger evidence if the probabilities are being arrived at rationally; that depends on the prior. For example, suppose two experts are asked to provide a probability distribution over bitstrings of length 20 that will be generated from the next 20 flips of a certain coin. Expert 1 assigns probability 2^-20 to each bitstring. Expert 2 assigns probability 10% to the particular bitstring 00100010101111010000, and distributes probability evenly among the remaining bitstrings. In this case it is expert 2 who’s claiming to have some very interesting information about how this coin works, which they wouldn’t have claimed without good reason, even though they are assigning 10% probability to an event that expert 1 is assigning probability 2^-20 to. Second, what if the probabilities aren’t arrived at rationally? Probabilities are between 0 and 1, while log odds are between −∞ and +∞, so when averaging a large number of probabilities together, no unreliable source can move the average too much, but when averaging a large number of log odds, an unreliable source can have arbitrarily large effects on the result. And third, probabilities, not log odds, are the correct scale to use for decision-making. If expert 1 says some event has probability 1%, or 3%, and expert 2 says the same event has probability 0.01% or 0.0000001%, then, if the event in question is important enough for you to care about these differences, the possibility that expert 1 has accounted for the reasons someone might give a very low probability and has good reason to give a much higher probability instead should be much more interesting to you than the hypothesis that expert 2 has good reason to give such a low probability, and the relatively large differences between the “1%” or “3%” that expert 1 might have told you shouldn’t be largely ignored and washed out in log odds averaging with expert 2.
Let’s go back to the example where you’re trying to get a probability out of the odds you’d be willing to bet at. I think it helps to think about why there would be a significant gap between the worst odds you’d accept a bet at and the worst odds you’d accept the opposite bet at. One reason is that someone else’s willingness to bet on something is evidence for it being true, so there should be some interval of odds in which their willingness to make the bet implies that you shouldn’t, in each direction. Even if you don’t think the other person has any relevant knowledge that you don’t, it’s not hard to be more likely to accept bets that are more favorable to you, so if the process by which you turn an intuitive sense of probability into a number is noisy, then if you’re forced to set odds that you’d have to take bets on either side of, even someone who knows nothing about the subject could exploit you on average. I think the possibility that adversaries can make ambiguity resolve against you disproportionately often is a good explanation for ambiguity aversion in general, since there are many situations, not just bets, where someone might have an opportunity to profit from your loss. Anyway, if the worst odds you’d be willing to bet on are bounds on how seriously you take the hypothesis that someone else knows something that should make you update a particular amount, and you want to get an actual probability, then you should average over probabilities you perhaps should end up at, weighted by how likely it is that you should end up at them. This is an arithmetic mean of probabilities, not a geometric mean of odds.