I was wondering this too. I haven’t looked at this A_p distribution yet (nor have I read all the comments here), but having distributions over distributions is, like, the core of Bayesian methods in machine learning. You don’t just keep a single estimate of the probability; you keep a distribution over possible probabilities, exactly like David is saying. I don’t even know how updating your probability distribution in light of new evidence (aka a “Bayesian update”) would work without this.
Am I missing something about David’s post? I did go through it rather quickly.
I was wondering this too. I haven’t looked at this A_p distribution yet (nor have I read all the comments here), but having distributions over distributions is, like, the core of Bayesian methods in machine learning. You don’t just keep a single estimate of the probability; you keep a distribution over possible probabilities, exactly like David is saying. I don’t even know how updating your probability distribution in light of new evidence (aka a “Bayesian update”) would work without this.
Am I missing something about David’s post? I did go through it rather quickly.