They don’t even have to be fat-tailed; in very simple examples you can know that on the next observation, your posterior will either be greater or lesser but not the same.
Here’s an example: flipping a biased coin in a beta distribution with a uniform prior, and trying to infer the bias/frequency. Obviously, when I flip the coin, I will either get a heads or a tails, so I know after my first flip, my posterior will either favor heads or tails, but not remain unchanged! There is no landing-on-its-edge intermediate 0.5 coin. Indeed, I know in advance I will be able to rule out 1 of 2 hypotheses: 100% heads and 100% tails.
But this isn’t just true of the first observation. Suppose I flip twice, and get heads then tails; so the single most likely frequency is 1⁄2 since that’s what I have to date. But now we’re back to the same situation as in the beginning: we’ve managed to accumulative evidence against the most extreme biases like 99% heads, so we have learned something from the 2 flips, but we’re back in the same situation where we expect the posterior to differ from the prior in 2 specific directions but cannot update the prior: the next flip I will either get 2⁄3 or 1⁄3 heads. Hence, I can tell you—even before flipping—that 1⁄2must be dethroned in favor of 1⁄3 or 2/3!
And yet if you add those two posterior distributions, weighted by your current probability of ending up with each, you get your prior back. Magic!
(Witch burners don’t get their prior back when they do this because they expect to update in the direction of “she’s a witch” in either case, so when they sum over probable posteriors, they get back their real prior which says “I already know that she’s a witch”, the implication being “the trial has low value of information, let’s just burn her now”.)
Max liklihood tells you which is most likely, which is mostly meaningless without further assumptions. For example, if you wanted to bet on what the next flip would be, a max liklihood method won’t give you the right probability.
OTOH, the expected value of the beta distribution with parameters a and b happens to equal the mode of the beta distribution with parameters a − 1 and b − 1, so maximum likelihood does give the right answer (i.e. the expected value of the posterior) if you start from the improper prior B(0, 0).
(IIRC, the same thing happens with other types of distributions, if you pick the ‘right’ improper prior (i.e. the one Jaynes argues for in conditions of total ignorance for totally unrelated reasons) for each. I wonder if this has some particular relevance.)
They don’t even have to be fat-tailed; in very simple examples you can know that on the next observation, your posterior will either be greater or lesser but not the same.
Here’s an example: flipping a biased coin in a beta distribution with a uniform prior, and trying to infer the bias/frequency. Obviously, when I flip the coin, I will either get a heads or a tails, so I know after my first flip, my posterior will either favor heads or tails, but not remain unchanged! There is no landing-on-its-edge intermediate 0.5 coin. Indeed, I know in advance I will be able to rule out 1 of 2 hypotheses: 100% heads and 100% tails.
But this isn’t just true of the first observation. Suppose I flip twice, and get heads then tails; so the single most likely frequency is 1⁄2 since that’s what I have to date. But now we’re back to the same situation as in the beginning: we’ve managed to accumulative evidence against the most extreme biases like 99% heads, so we have learned something from the 2 flips, but we’re back in the same situation where we expect the posterior to differ from the prior in 2 specific directions but cannot update the prior: the next flip I will either get 2⁄3 or 1⁄3 heads. Hence, I can tell you—even before flipping—that 1⁄2 must be dethroned in favor of 1⁄3 or 2/3!
And yet if you add those two posterior distributions, weighted by your current probability of ending up with each, you get your prior back. Magic!
(Witch burners don’t get their prior back when they do this because they expect to update in the direction of “she’s a witch” in either case, so when they sum over probable posteriors, they get back their real prior which says “I already know that she’s a witch”, the implication being “the trial has low value of information, let’s just burn her now”.)
Yup, sure does. Which is a step toward the right idea Kindly was gesturing at.
For coin bias estimate, as for most other things, the self-consistent updating procedure follows maximum likelihood.
Max liklihood tells you which is most likely, which is mostly meaningless without further assumptions. For example, if you wanted to bet on what the next flip would be, a max liklihood method won’t give you the right probability.
Yes.
OTOH, the expected value of the beta distribution with parameters a and b happens to equal the mode of the beta distribution with parameters a − 1 and b − 1, so maximum likelihood does give the right answer (i.e. the expected value of the posterior) if you start from the improper prior B(0, 0).
(IIRC, the same thing happens with other types of distributions, if you pick the ‘right’ improper prior (i.e. the one Jaynes argues for in conditions of total ignorance for totally unrelated reasons) for each. I wonder if this has some particular relevance.)