There simply is no right answer to your question in general. The problem is that most of the time you simply have no way of knowing whether the experts’ opinions are independent or not. The thing to realise is that even if the experts don’t talk to each other and have entirely separate evidence, the underlying reality can still create a dependence. Vaniver’s comment actually says it pretty well already, but just to hammer it home let me give you a specific example.
Imagine the underlying process is this: A coin is flipped 6 times, and each time either a 1 or a 0 is written a the side of a 6 sided die. Then the die is rolled, and you’re interested in whether it rolled a 1. Obviously your prior is 0.5. Now imagine there are 3 experts who all give a 2⁄3 chance that a 1 was rolled.
Situation 1: Each expert has made a noisy observation of the actual die roll. Maybe they took a photo, but the photos are blurry and noisy. In this case, the evidence from each of the three separate photos is independent, and the odds combine like DanielLC describes to give an 8⁄9 chance that a 1 was rolled. With more experts saying the same thing, the probability converges to 1 here. Of course if they all had seen copies of the same photo it would be a different story...
Situation 2: No-one has seen the roll of interest itself, but each of the experts has seen the result of many other rolls of the same die (different other rolls for each expert). In this case, it’s clear that all you have is strong evidence that there are four 1s and two 0s on the die, and the probability stays at 2⁄3. Note that the experts haven’t spoken to each other, nor have they seen the same evidence, they’re correlated by an underlying property of the system.
Situation 3: This one is a little strained I’ll admit, but it’s important to illustrate that the odds can actually be less than your prior even though the experts are individually giving chances that are higher than it. Imagine it’s common knowledge among experts that there are five 1s on the die (maybe they’ve all seen hundreds of other rolls, though still different rolls from each other). However, each of them also has a photo of the actual roll, and again the photos are not completely clear but in each case it sure does look a little more like a 0 than a 1. In this case, the probability from their combined knowledge is actually 8/33! Ironically I used DanielLC’s method again for this calculation.
The point of this then is that the answer to what their combined information would tell you could actually be literally anything at all. You just don’t know, and the fact that the experts don’t talk to each other and see separate evidence is not enough to assume they’re uncorrelated. Of course there has to be a correct answer to what probability to assign in the case of ignorance on the level of correlation between the experts, and I’m actually not sure exactly what it is. Whatever it is though there’s a good chance of it still turning out to be consistently under- or over-confidant across multiple similar trials (assuming you’re doing such a thing). If this is a machine learning situation for instance (which it sounds like) I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate. I can give you some more detailed advice on how to go about doing that correctly if you wish.
Personally I would by default go with averaging the experts as my best guess. Averaged in log-odds space though of course (=log(p/(1-p)), not averaging the 0 to 1 probabilities. DanielLC’s advice is theoretically well founded but the assumption of statistically independent evidence is, as I say, usually unwarrented. I would expect his method to generally give overconfident probabilities in practice.
I’m assuming their opinions are independent, usually because they’re trained on different features that have low correlations with each other. I was thinking of adding in log-odds space, as a way of adding up bits of information, and this turns out to be the same as using DanielLC’s method. Averaging instead seems reasonable if correlations are high.
Yes, but the key point I was trying to make is that using different features with low correlations does not at all ensure that adding the evidence is correct. What matters is not correlations between the features, but correlations between the experts. Correlated features will of course mean correlated experts, but the converse is not true. The features don’t have to be correlated for the experts to make mistakes on the same inputs. It’s often the case that they do simply because some inputs are fundamentally more difficult than others, in ways that affect all of the features.
If you’ve observed that there’s low correlations between the experts, then you’ve effectively already followed my main suggestion: ” I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate”. If you’ve only observed low correlations between features then I’d say it’s quite likely you’re going to generate overconfident results.
PS Much as I don’t like “appeal to authority”, I do think it’s worth pointing out that I deal with exactly this problem at work, so I’m not just talking out of my behind here. Obviously it’s hard to know how well experience in my field correlates with yours without knowing what your field is, but I’d expect these issues to be general.
There simply is no right answer to your question in general. The problem is that most of the time you simply have no way of knowing whether the experts’ opinions are independent or not. The thing to realise is that even if the experts don’t talk to each other and have entirely separate evidence, the underlying reality can still create a dependence. Vaniver’s comment actually says it pretty well already, but just to hammer it home let me give you a specific example.
Imagine the underlying process is this: A coin is flipped 6 times, and each time either a 1 or a 0 is written a the side of a 6 sided die. Then the die is rolled, and you’re interested in whether it rolled a 1. Obviously your prior is 0.5. Now imagine there are 3 experts who all give a 2⁄3 chance that a 1 was rolled.
Situation 1: Each expert has made a noisy observation of the actual die roll. Maybe they took a photo, but the photos are blurry and noisy. In this case, the evidence from each of the three separate photos is independent, and the odds combine like DanielLC describes to give an 8⁄9 chance that a 1 was rolled. With more experts saying the same thing, the probability converges to 1 here. Of course if they all had seen copies of the same photo it would be a different story...
Situation 2: No-one has seen the roll of interest itself, but each of the experts has seen the result of many other rolls of the same die (different other rolls for each expert). In this case, it’s clear that all you have is strong evidence that there are four 1s and two 0s on the die, and the probability stays at 2⁄3. Note that the experts haven’t spoken to each other, nor have they seen the same evidence, they’re correlated by an underlying property of the system.
Situation 3: This one is a little strained I’ll admit, but it’s important to illustrate that the odds can actually be less than your prior even though the experts are individually giving chances that are higher than it. Imagine it’s common knowledge among experts that there are five 1s on the die (maybe they’ve all seen hundreds of other rolls, though still different rolls from each other). However, each of them also has a photo of the actual roll, and again the photos are not completely clear but in each case it sure does look a little more like a 0 than a 1. In this case, the probability from their combined knowledge is actually 8/33! Ironically I used DanielLC’s method again for this calculation.
The point of this then is that the answer to what their combined information would tell you could actually be literally anything at all. You just don’t know, and the fact that the experts don’t talk to each other and see separate evidence is not enough to assume they’re uncorrelated. Of course there has to be a correct answer to what probability to assign in the case of ignorance on the level of correlation between the experts, and I’m actually not sure exactly what it is. Whatever it is though there’s a good chance of it still turning out to be consistently under- or over-confidant across multiple similar trials (assuming you’re doing such a thing). If this is a machine learning situation for instance (which it sounds like) I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate. I can give you some more detailed advice on how to go about doing that correctly if you wish.
Personally I would by default go with averaging the experts as my best guess. Averaged in log-odds space though of course (=log(p/(1-p)), not averaging the 0 to 1 probabilities. DanielLC’s advice is theoretically well founded but the assumption of statistically independent evidence is, as I say, usually unwarrented. I would expect his method to generally give overconfident probabilities in practice.
I’m assuming their opinions are independent, usually because they’re trained on different features that have low correlations with each other. I was thinking of adding in log-odds space, as a way of adding up bits of information, and this turns out to be the same as using DanielLC’s method. Averaging instead seems reasonable if correlations are high.
Yes, but the key point I was trying to make is that using different features with low correlations does not at all ensure that adding the evidence is correct. What matters is not correlations between the features, but correlations between the experts. Correlated features will of course mean correlated experts, but the converse is not true. The features don’t have to be correlated for the experts to make mistakes on the same inputs. It’s often the case that they do simply because some inputs are fundamentally more difficult than others, in ways that affect all of the features.
If you’ve observed that there’s low correlations between the experts, then you’ve effectively already followed my main suggestion: ” I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate”. If you’ve only observed low correlations between features then I’d say it’s quite likely you’re going to generate overconfident results.
PS Much as I don’t like “appeal to authority”, I do think it’s worth pointing out that I deal with exactly this problem at work, so I’m not just talking out of my behind here. Obviously it’s hard to know how well experience in my field correlates with yours without knowing what your field is, but I’d expect these issues to be general.