(Rewritten entirely after seeing pragmatist’s answer.)
In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren’t independent. (DanielLC’s method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I’d expected to be correct.)
I ran into problems applying this, because sometimes the prior isn’t constant across samples. Right now I’m combining different sources of information to choose the correct transcription start site for a gene. These bacterial genes typically have from 1 to 20 possible start sites. The prior is 1 / (number of possible sites).
Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q. Assume this multiplier, lm, is constant, regardless of the prior. This is reasonable, since we always factor out the prior. Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s). That’s P(Q(s) | prior=1/numStarts, O(s)).
Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.
9:1 / 1:3 = 27:1
Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:
19:1 / 1:1 = 19:1
I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts. I think I can just compute it as
(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where
A follow-up probability question: Data samples with different priors
(Rewritten entirely after seeing pragmatist’s answer.)
In this post, helpful people including DanielLC gave me the multiply-odds-ratios method for combining probability estimates given by independent experts with a constant prior, with many comments about what to do when they aren’t independent. (DanielLC’s method turns out to be identical to summing up the bits of information for and against the hypothesis, which is what I’d expected to be correct.)
I ran into problems applying this, because sometimes the prior isn’t constant across samples. Right now I’m combining different sources of information to choose the correct transcription start site for a gene. These bacterial genes typically have from 1 to 20 possible start sites. The prior is 1 / (number of possible sites).
Suppose I want to figure out the correct likelihood multiplier for the information that a start site overlaps the stop of the previous gene, which I will call property Q. Assume this multiplier, lm, is constant, regardless of the prior. This is reasonable, since we always factor out the prior. Some function of the prior gives me the posterior probability that a site s is the correct start (Q(s) is true), given that O(s). That’s P(Q(s) | prior=1/numStarts, O(s)).
Suppose I look just at those cases where numStarts = 4, I find that P(Q(s) | numStarts=4, O(s)) = .9.
9:1 / 1:3 = 27:1
Or I can look at the cases where numStarts=2, and find that in these cases, P(Q(s) | numStarts=2, O(s)) = .95:
19:1 / 1:1 = 19:1
I want to take one pass through the data and come up with a single likelihood multiplier, rather than binning all the data into different groups by numStarts. I think I can just compute it as
(sum of numerator : sum of denominator) over all cases s_i where O(s_i) is true, where
numerator = (numStarts_i-1) * Q(s_i)
denominator = (1-Q(s_i))
Is this correct?