If you have a lot of experts and a lot of objects, I might try a generative model where each object had unseen values from an n-dimensional feature space, and where experts decided what features to notice using weightings from a dual n-dimensional space, with the weight covectors generated as clustered in some way to represent the experts’ structured non-independence. The experts’ probability estimates would be something like a logistic function of the product of each object’s features with the expert’s weights (plus noise), and your output summary probability would be the posterior mean of an estimate based on a special “best” expert weighting, derived using the assumption that the experts’ estimates are well-calibrated.
I’m not sure what an appropriate generative model of clustered expert feature weightings would be.
Actually, I guess the output of this procedure would just end up being a log-linear model of the truth given the experts’ confidences. (Some of the coefficients might be negative, to cancel confounding factors.) So maybe a lot easier way to fix this is to sample from the space of such log-linear models directly, using sampled hypothetical imputed truths, while enforcing some constraint that the experts’ opinions be reasonably well-calibrated.
You have 2 independent measurements
I ignored this because I wasn’t sure what you could have meant by “independent”. If you meant that the experts’ outputs are fully independent, conditional on the truth, then the problem is straightforward. But this seems unlikely in practice. You probably just meant the informal English connotation “not completely dependent”.
If you have a lot of experts and a lot of objects, I might try a generative model where each object had unseen values from an n-dimensional feature space, and where experts decided what features to notice using weightings from a dual n-dimensional space, with the weight covectors generated as clustered in some way to represent the experts’ structured non-independence. The experts’ probability estimates would be something like a logistic function of the product of each object’s features with the expert’s weights (plus noise), and your output summary probability would be the posterior mean of an estimate based on a special “best” expert weighting, derived using the assumption that the experts’ estimates are well-calibrated.
I’m not sure what an appropriate generative model of clustered expert feature weightings would be.
Actually, I guess the output of this procedure would just end up being a log-linear model of the truth given the experts’ confidences. (Some of the coefficients might be negative, to cancel confounding factors.) So maybe a lot easier way to fix this is to sample from the space of such log-linear models directly, using sampled hypothetical imputed truths, while enforcing some constraint that the experts’ opinions be reasonably well-calibrated.
I ignored this because I wasn’t sure what you could have meant by “independent”. If you meant that the experts’ outputs are fully independent, conditional on the truth, then the problem is straightforward. But this seems unlikely in practice. You probably just meant the informal English connotation “not completely dependent”.