Suppose that our data are coin flips, and consider three hypotheses: H0 = always heads, H1 = fair coin, H2 = heads with probability 25%. Now suppose that the two hypotheses we actually want to test between are H0 and H’ = 0.5(H1+H2). After seeing a single heads, the likelihood of H0 is 1 and the likelihood of H’ is 0.5(0.5+0.25). After seeing two heads, the likelihood of H0 is 1 and the likelihood of H’ is 0.5(0.5^2+0.25^2). In general, the likelihood of H’ after n heads is 0.5(0.5^n+0.25^n), i.e. a mixture of multiple geometric functions. In general if H’ is a mixture of many hypotheses, the likelihood will be a mixture of many geometric functions, and therefore could be more or less arbitrary.
Oops, missed that; but that specification doesn’t hold in the situation we care about, since rejecting the null hypotheses typically requires us to consider the result of marginalizing over a space of alternative hypotheses (well, assuming we’re being Bayesians, but I know you prefer that anyways =P).
Well, right, assuming we’re Bayesians, but when we’re just “rejecting the null hypothesis” we should mostly be concerned about likelihood from the null hypothesis which has no moving parts, which is why I used the log approximation I did. But at this point we’re mixing frequentism and Bayes to the point where I shan’t defend the point further—it’s certainly true that once a Bayesian considers more than exactly two atomic hypotheses, the update on two independent pieces of evidence doesn’t go as the square of one update (even though the likelihood ratios still go as the square, etc.).
Suppose that our data are coin flips, and consider three hypotheses: H0 = always heads, H1 = fair coin, H2 = heads with probability 25%. Now suppose that the two hypotheses we actually want to test between are H0 and H’ = 0.5(H1+H2). After seeing a single heads, the likelihood of H0 is 1 and the likelihood of H’ is 0.5(0.5+0.25). After seeing two heads, the likelihood of H0 is 1 and the likelihood of H’ is 0.5(0.5^2+0.25^2). In general, the likelihood of H’ after n heads is 0.5(0.5^n+0.25^n), i.e. a mixture of multiple geometric functions. In general if H’ is a mixture of many hypotheses, the likelihood will be a mixture of many geometric functions, and therefore could be more or less arbitrary.
That’s why I specified single possible worlds / hypotheses with no internal parameters that are being learned.
Oops, missed that; but that specification doesn’t hold in the situation we care about, since rejecting the null hypotheses typically requires us to consider the result of marginalizing over a space of alternative hypotheses (well, assuming we’re being Bayesians, but I know you prefer that anyways =P).
Well, right, assuming we’re Bayesians, but when we’re just “rejecting the null hypothesis” we should mostly be concerned about likelihood from the null hypothesis which has no moving parts, which is why I used the log approximation I did. But at this point we’re mixing frequentism and Bayes to the point where I shan’t defend the point further—it’s certainly true that once a Bayesian considers more than exactly two atomic hypotheses, the update on two independent pieces of evidence doesn’t go as the square of one update (even though the likelihood ratios still go as the square, etc.).