Expect to know better when you know more

A seemingly trivial result, that I haven’t seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Let H be the true hypothesis/​model/​environment/​distribution, and ~H its negation. Let e be evidence we receive, taking values e1, e2, … en. Let pi=P(e=ei|H) and qi=P(E=ei|~H).

The expected posterior weighting of H, P(e|H), is Σpipi while the expected posterior weighting of ~H, P(e|~H), is Σqipi. Then since the pi and qi both sum to 1, Cauchy–Schwarz implies that

  • E(P(e|H)) ≥ E(P(e|~H)).

Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.

This, however, doesn’t mean that the Bayes factor—P(e|H)/​P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=ei is (pi/​qi). Thus the expected Bayes factor is Σ(pi/​qi)pi. The negative logarithm is a convex function; hence by Jensen’s inequality, -log[E(P(e|H)/​P(e|~H))] ≤ -E[log(P(e|H)/​P(e|~H))]. That last expectation is Σ(log(pi/​qi))pi. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/​P(e|~H))] ≥ 0, and hence

  • E(P(e|H)/​P(e|~H)) ≥ 1.

Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.

Note that this is not true for the inverse. Indeed E(P(e|~H)/​P(e|H)) = Σ(qi/​pi)pi = Σqi = 1.

In the preceding proofs, ~H played no specific role, and hence

  • For all K, E(P(e|H)) ≥ E(P(e|K)) and E(P(e|H)/​P(e|K)) ≥ 1 (and E(P(e|K)/​P(e|H)) = 1).

Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.

Now we can turn to the posterior probability P(H|e). For e=ei, this is P(H)*P(e=ei|H)/​P(e=ei). We can compute the expectation of P(e|H)/​P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:

  • E(P(H|e)) ≥ P(H).

Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.