There are many things to say about this result by N. Taleb. To start with, a minor detail: I’s would have written $\hat{p} = I^{-1}_{1/2}(m+1, n—m)$, which is much more coherent with the fact that he is inverting the CDF.
He is inverting the CDF of a Beta distribution with parameters (m+1, n-m) which is a posterior in the Beta-Binomial model of a Beta(1, 0) distribution (!!!), with no explanation at all! It would have made slightly more sense to use a Beta(1, 1) instead.
Note that all he does by selecting q = 1⁄2 choosing as this “optimal estimate” the median of the Beta(m+1, n-m) distribution, i.e., the median of the posterior distribution.
Note that he ignores completely the base rate of 5%. Cannot he make use of it at all? So, even better than a Beta(1, 1), I’d have chosen the maximum entropy distribution among those betas with mean .05. I.e., one with a large variance; in fact, Taleb complains that the Bayesian approach provides funny results with highly informative beta priors.
If I had been facing the problem, I would have inquired about the distribution of those historical records whose aggregation is a 5% average and use it as a prior to model this new doctor.
All in all, I do not thing Taleb wrote his best page on that day. But he has many other great ones to learn from!
There are many things to say about this result by N. Taleb. To start with, a minor detail: I’s would have written $\hat{p} = I^{-1}_{1/2}(m+1, n—m)$, which is much more coherent with the fact that he is inverting the CDF.
He is inverting the CDF of a Beta distribution with parameters (m+1, n-m) which is a posterior in the Beta-Binomial model of a Beta(1, 0) distribution (!!!), with no explanation at all! It would have made slightly more sense to use a Beta(1, 1) instead.
Note that all he does by selecting q = 1⁄2 choosing as this “optimal estimate” the median of the Beta(m+1, n-m) distribution, i.e., the median of the posterior distribution.
Note that he ignores completely the base rate of 5%. Cannot he make use of it at all? So, even better than a Beta(1, 1), I’d have chosen the maximum entropy distribution among those betas with mean .05. I.e., one with a large variance; in fact, Taleb complains that the Bayesian approach provides funny results with highly informative beta priors.
If I had been facing the problem, I would have inquired about the distribution of those historical records whose aggregation is a 5% average and use it as a prior to model this new doctor.
All in all, I do not thing Taleb wrote his best page on that day. But he has many other great ones to learn from!