Expect to know better when you know more

Stuart_Armstrong21 Apr 2016 15:47 UTC

5 points

A seemingly trivial result, that I haven’t seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Let H be the true hypothesis/model/environment/distribution, and ~H its negation. Let e be evidence we receive, taking values e₁, e₂, … e_n. Let p_i=P(e=e_i|H) and q_i=P(E=e_i|~H).

The expected posterior weighting of H, P(e|H), is Σp_ip_i while the expected posterior weighting of ~H, P(e|~H), is Σq_ip_i. Then since the p_i and q_i both sum to 1, Cauchy–Schwarz implies that

E(P(e|H)) ≥ E(P(e|~H)).

Thus, in expectation, the probability of the evidence given the true hypothesis, is higher than or equal to the probability of the evidence given its negation.

This, however, doesn’t mean that the Bayes factor—P(e|H)/P(e|~H) - must have expectation greater than one, since ratios of expectation are not the same as expectations of ratio. The Bayes factor given e=e_i is (p_i/q_i). Thus the expected Bayes factor is Σ(p_i/q_i)p_i. The negative logarithm is a convex function; hence by Jensen’s inequality, -log[E(P(e|H)/P(e|~H))] ≤ -E[log(P(e|H)/P(e|~H))]. That last expectation is Σ(log(p_i/q_i))p_i. This is the Kullback–Leibler divergence of P(e|~H) from P(e|H), and hence is non-negative. Thus log[E(P(e|H)/P(e|~H))] ≥ 0, and hence

E(P(e|H)/P(e|~H)) ≥ 1.

Thus, in expectation, the Bayes factor, for the true hypothesis versus its negation, is greater than or equal to one.

Note that this is not true for the inverse. Indeed E(P(e|~H)/P(e|H)) = Σ(q_i/p_i)p_i = Σq_i = 1.

In the preceding proofs, ~H played no specific role, and hence

For all K, E(P(e|H)) ≥ E(P(e|K)) and E(P(e|H)/P(e|K)) ≥ 1 (and E(P(e|K)/P(e|H)) = 1).

Thus, in expectation, the probability of the true hypothesis versus anything, is greater or equal in both absolute value and ratio.

Now we can turn to the posterior probability P(H|e). For e=e_i, this is P(H)*P(e=e_i|H)/P(e=e_i). We can compute the expectation of P(e|H)/P(e) as above, using the non-negative Kullback–Leibler divergence of P(e) from P(e|H), and thus showing it has an expectation greater than or equal to 1. Hence:

E(P(H|e)) ≥ P(H).

Thus, in expectation, the posterior probability of the true hypothesis is greater than or equal to its prior probability.

Stuart_Armstrong21 Apr 2016 15:47 UTC

5 points

6 comments1 min readLW link Archive

Lumifer 21 Apr 2016 17:27 UTC
1 point

… is higher than the probability

...is equal to or higher than the probability

is greater than one

is equal to or greater than one

Thus, in expectation, the posterior probability of the true hypothesis is greater than its prior probability.

Thus, in expectation, the posterior probability of the true hypothesis is equal to or greater than its prior probability.

That matters.
- Stuart_Armstrong 21 Apr 2016 17:55 UTC
  0 points
  Parent
  I tend to go for greater being ⇒ and strictly greater being >.
  - Lumifer 21 Apr 2016 18:22 UTC
    3 points
    Parent
    That’s not how English works and that’s not how people will understand your words.
    - Stuart_Armstrong 21 Apr 2016 19:00 UTC
      5 points
      Parent
      Thinking about it, you are correct; I will put off my efforts to reform mathematical terminology to another time.
gwern 21 Apr 2016 21:27 UTC
0 points

A seemingly trivial result, that I haven’t seen posted anywhere in this form, that I could find. It simply shows that we expect evidence to increase the posterior probability of the true hypothesis.

Is it the proof or result which is supposed to be new? I would be really surprised if there were no proofs that Bayesian estimators are consistent and concentrate on the true posterior result.
- Stuart_Armstrong 22 Apr 2016 5:28 UTC
  0 points
  Parent
  The proof is so trivial that it must have been proved before, but I spent two hours searching for the result and couldn’t find it (it’s very plausible I just lack the correct search terms). The closest I could find were things like the Bernstein–von Mises theorem, but that’s not exactly it.