I find this intellectually stimulating, but it does not look useful in practice, because with repeated i.i.d. data the information in the data is much higher than the prior if the prior is diffuse/universal/ignorance.
You raise a good point. But I think the choice of prior is important quite often:
In the limit of large i.i.d. data (N>1000), both Laplace’s Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace’s Rule was in the small N regime, where the frequentist estimate is clearly absurd.
In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace’s Rule: P(next success) = 13⁄14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12⁄12 successes, how confident should we be in seeing a quadrillion more successes? Laplace’s uniform prior assigns this very low probability, while my prior gives it significant weight.
I find this intellectually stimulating, but it does not look useful in practice, because with repeated i.i.d. data the information in the data is much higher than the prior if the prior is diffuse/universal/ignorance.
You raise a good point. But I think the choice of prior is important quite often:
In the limit of large i.i.d. data (N>1000), both Laplace’s Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace’s Rule was in the small N regime, where the frequentist estimate is clearly absurd.
In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace’s Rule: P(next success) = 13⁄14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12⁄12 successes, how confident should we be in seeing a quadrillion more successes? Laplace’s uniform prior assigns this very low probability, while my prior gives it significant weight.