You raise a good point. But I think the choice of prior is important quite often:
In the limit of large i.i.d. data (N>1000), both Laplace’s Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace’s Rule was in the small N regime, where the frequentist estimate is clearly absurd.
In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace’s Rule: P(next success) = 13⁄14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12⁄12 successes, how confident should we be in seeing a quadrillion more successes? Laplace’s uniform prior assigns this very low probability, while my prior gives it significant weight.
You raise a good point. But I think the choice of prior is important quite often:
In the limit of large i.i.d. data (N>1000), both Laplace’s Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace’s Rule was in the small N regime, where the frequentist estimate is clearly absurd.
In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace’s Rule: P(next success) = 13⁄14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12⁄12 successes, how confident should we be in seeing a quadrillion more successes? Laplace’s uniform prior assigns this very low probability, while my prior gives it significant weight.