I’m assuming their opinions are independent, usually because they’re trained on different features that have low correlations with each other. I was thinking of adding in log-odds space, as a way of adding up bits of information, and this turns out to be the same as using DanielLC’s method. Averaging instead seems reasonable if correlations are high.
Yes, but the key point I was trying to make is that using different features with low correlations does not at all ensure that adding the evidence is correct. What matters is not correlations between the features, but correlations between the experts. Correlated features will of course mean correlated experts, but the converse is not true. The features don’t have to be correlated for the experts to make mistakes on the same inputs. It’s often the case that they do simply because some inputs are fundamentally more difficult than others, in ways that affect all of the features.
If you’ve observed that there’s low correlations between the experts, then you’ve effectively already followed my main suggestion: ” I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate”. If you’ve only observed low correlations between features then I’d say it’s quite likely you’re going to generate overconfident results.
PS Much as I don’t like “appeal to authority”, I do think it’s worth pointing out that I deal with exactly this problem at work, so I’m not just talking out of my behind here. Obviously it’s hard to know how well experience in my field correlates with yours without knowing what your field is, but I’d expect these issues to be general.
I’m assuming their opinions are independent, usually because they’re trained on different features that have low correlations with each other. I was thinking of adding in log-odds space, as a way of adding up bits of information, and this turns out to be the same as using DanielLC’s method. Averaging instead seems reasonable if correlations are high.
Yes, but the key point I was trying to make is that using different features with low correlations does not at all ensure that adding the evidence is correct. What matters is not correlations between the features, but correlations between the experts. Correlated features will of course mean correlated experts, but the converse is not true. The features don’t have to be correlated for the experts to make mistakes on the same inputs. It’s often the case that they do simply because some inputs are fundamentally more difficult than others, in ways that affect all of the features.
If you’ve observed that there’s low correlations between the experts, then you’ve effectively already followed my main suggestion: ” I would strongly advise you simply make some observations of exactly how the probabilities of the experts correlate”. If you’ve only observed low correlations between features then I’d say it’s quite likely you’re going to generate overconfident results.
PS Much as I don’t like “appeal to authority”, I do think it’s worth pointing out that I deal with exactly this problem at work, so I’m not just talking out of my behind here. Obviously it’s hard to know how well experience in my field correlates with yours without knowing what your field is, but I’d expect these issues to be general.