Expect to know better when you know more
A seemingly trivial result, that I haven’t seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.
Let H be the true hypothesis/model/environment/distribution, and ~H its negation. Let e be evidence we receive, taking values e1, e2, … en. Let pi=P(e=ei|H) and qi=P(E=ei|~H).
The expected posterior weighting of H, P(e|H), is Σpipi while the expected posterior weighting of ~H, P(e|~H), is Σqipi. Then since the pi and qi both sum to 1, Cauchy–Schwarz implies that
E(P(e|H)) ≥ E(P(e|~H)).
Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.
This, however, doesn’t mean that the Bayes factor—P(e|H)/P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=ei is (pi/qi). Thus the expected Bayes factor is Σ(pi/qi)pi. The negative logarithm is a convex function; hence by Jensen’s inequality, -log[E(P(e|H)/P(e|~H))] ≤ -E[log(P(e|H)/P(e|~H))]. That last expectation is Σ(log(pi/qi))pi. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/P(e|~H))] ≥ 0, and hence
E(P(e|H)/P(e|~H)) ≥ 1.
Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.
Note that this is not true for the inverse. Indeed E(P(e|~H)/P(e|H)) = Σ(qi/pi)pi = Σqi = 1.
In the preceding proofs, ~H played no specific role, and hence
For all K, E(P(e|H)) ≥ E(P(e|K)) and E(P(e|H)/P(e|K)) ≥ 1 (and E(P(e|K)/P(e|H)) = 1).
Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.
Now we can turn to the posterior probability P(H|e). For e=ei, this is P(H)*P(e=ei|H)/P(e=ei). We can compute the expectation of P(e|H)/P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:
E(P(H|e)) ≥ P(H).
Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.
...is equal to or higher than the probability
is equal to or greater than one
Thus, in expectation, the posterior probability of the true hypothesis is equal to or greater than its prior probability.
That matters.
I tend to go for greater being ⇒ and strictly greater being >.
That’s not how English works and that’s not how people will understand your words.
Thinking about it, you are correct; I will put off my efforts to reform mathematical terminology to another time.
Is it the proof or result which is supposed to be new? I would be really surprised if there were no proofs that Bayesian estimators are consistent and concentrate on the true posterior result.
The proof is so trivial that it must have been proved before, but I spent two hours searching for the result and couldn’t find it (it’s very plausible I just lack the correct search terms). The closest I could find were things like the Bernstein–von Mises theorem, but that’s not exactly it.