I have no additional information. This is the general case that I need to solve. This is the information that I have, and I need to make a decision.
(The real-world problem is that I have a zillion classifiers, that give probability estimates for dozens of different things, and I have to combine their outputs for each of these dozens of things. I don’t have time to look inside any of them and ask for more details. I need a function that takes as an argument one prior and N estimates, assumes the estimates are independent, and produces an output. I usually can’t find their correlations due to the training data not being available or other problems, and anyway I don’t have time to write the code to do that, and they’re usually probably small correlations.)
Are you dealing with things where it’s likely to be independent? If you’re looking at studies, they probably will be. If you’re looking at experts, they probably won’t.
There are unsupervised methods, if you have unlabeled data, which I suspect you do. I don’t know about standard methods, but here are a few simple ideas off the top of my head:
First, you can check if A is consistent with the prior by seeing that average probability it predicts over your data is your prior for Q. If not, there are a lot of possible failure modes, such as your new data being different from the data used to set your prior, or A being wrong or miscalibrated. If I trusted the prior a lot and wanted to fix the problem, I would scale the evidence (the odds ratio of A from the prior) by a constant.
You can apply the same test to the joint prediction. If A and B each produce the right frequency, but their joint prediction does not, then they are correlated. It is probably worth doing this, as a check on your assumption of independence. You might try to correct for this correlation by scaling the joint evidence, the same way I suggested scaling a single test. (Note that if A=B, scaling is the correct answer.)
But if you have many tests and you correct each pair, it is no longer clear how to combine all of them. One simple answers is to drop tests in highly correlated pairs and assume everything that else is independent. To salvage some information rather than dropping tests, you might cluster tests into correlated groups, use scaling to correct within clusters and assume the clusters are independent.
In mixture of expert problems, the experts are not independent, that’s the whole problem. They are all trying to correlate to some underlying reality, and thereby are correlated with each other.
But you also say “dozens of different things”. Are they trying to estimate the same things, different things, or different things that should all correlate to the same thing?
See my longer comment above for more details, but it’s sounding like you don’t want to evaluate over the whole data set, you just want to make some assumption about the statistics of your classifiers, and combine them via maximum entropy and those statistics.
I have no additional information. This is the general case that I need to solve. This is the information that I have, and I need to make a decision.
(The real-world problem is that I have a zillion classifiers, that give probability estimates for dozens of different things, and I have to combine their outputs for each of these dozens of things. I don’t have time to look inside any of them and ask for more details. I need a function that takes as an argument one prior and N estimates, assumes the estimates are independent, and produces an output. I usually can’t find their correlations due to the training data not being available or other problems, and anyway I don’t have time to write the code to do that, and they’re usually probably small correlations.)
Are you dealing with things where it’s likely to be independent? If you’re looking at studies, they probably will be. If you’re looking at experts, they probably won’t.
There are unsupervised methods, if you have unlabeled data, which I suspect you do. I don’t know about standard methods, but here are a few simple ideas off the top of my head:
First, you can check if A is consistent with the prior by seeing that average probability it predicts over your data is your prior for Q. If not, there are a lot of possible failure modes, such as your new data being different from the data used to set your prior, or A being wrong or miscalibrated. If I trusted the prior a lot and wanted to fix the problem, I would scale the evidence (the odds ratio of A from the prior) by a constant.
You can apply the same test to the joint prediction. If A and B each produce the right frequency, but their joint prediction does not, then they are correlated. It is probably worth doing this, as a check on your assumption of independence. You might try to correct for this correlation by scaling the joint evidence, the same way I suggested scaling a single test. (Note that if A=B, scaling is the correct answer.)
But if you have many tests and you correct each pair, it is no longer clear how to combine all of them. One simple answers is to drop tests in highly correlated pairs and assume everything that else is independent. To salvage some information rather than dropping tests, you might cluster tests into correlated groups, use scaling to correct within clusters and assume the clusters are independent.
In mixture of expert problems, the experts are not independent, that’s the whole problem. They are all trying to correlate to some underlying reality, and thereby are correlated with each other.
But you also say “dozens of different things”. Are they trying to estimate the same things, different things, or different things that should all correlate to the same thing?
See my longer comment above for more details, but it’s sounding like you don’t want to evaluate over the whole data set, you just want to make some assumption about the statistics of your classifiers, and combine them via maximum entropy and those statistics.