I was wondering this too. I haven’t looked at this A_p distribution yet (nor have I read all the comments here), but having distributions over distributions is, like, the core of Bayesian methods in machine learning. You don’t just keep a single estimate of the probability; you keep a distribution over possible probabilities, exactly like David is saying. I don’t even know how updating your probability distribution in light of new evidence (aka a “Bayesian update”) would work without this.
Am I missing something about David’s post? I did go through it rather quickly.
I’m sure you know more about this than I do! Based on a quick Wiki check, I suspect that formally the A_p are one type of hyperprior, but not all hyperpriors are A_p (a/k/a metaprobabilities).
Hyperparameters are used in Bayesian sensitivity analysis, a/k/a “Robust Bayesian Analysis”, which I recently accidentally reinvented here. I might write more about that later in this sequence.
Yeah—from what I’ve seen, something mathematically equivalent to A_p distributions are commonly used, but that’s not what they’re called.
Like, I think you might call the case in this problem “a Bernoulli random variable with an unknown parameter”. (The Bernoulli random variable being 1 if it gives you $2, 0 if it gives you $0). And then the hyperprior would be the probability distribution of that parameter, I guess? I haven’t really heard that word before.
ET Jaynes, of course, would never talk like this because the idea of a random quantity existing in the real world is a mind projection fallacy. Thus, no “random variables”. So he uses the A_p distribution as a way of thinking about the same math without the idea of randomness. Jaynes’s A_p in this case corresponds exactly to the more traditional “the parameter of the Bernoulli random variable is p”.
So perhaps this is for the next post, but are these ‘metaprobabilities’ just regular hyperparameters?
I was wondering this too. I haven’t looked at this A_p distribution yet (nor have I read all the comments here), but having distributions over distributions is, like, the core of Bayesian methods in machine learning. You don’t just keep a single estimate of the probability; you keep a distribution over possible probabilities, exactly like David is saying. I don’t even know how updating your probability distribution in light of new evidence (aka a “Bayesian update”) would work without this.
Am I missing something about David’s post? I did go through it rather quickly.
I’m sure you know more about this than I do! Based on a quick Wiki check, I suspect that formally the A_p are one type of hyperprior, but not all hyperpriors are A_p (a/k/a metaprobabilities).
Hyperparameters are used in Bayesian sensitivity analysis, a/k/a “Robust Bayesian Analysis”, which I recently accidentally reinvented here. I might write more about that later in this sequence.
When you use an underscore in a name, make sure to escape it first, like so:
(This is necessary because underscores are yet another way to make things italic, and only applies to comments, as posts use different formatting.)
Thanks! Fixed.
Yeah—from what I’ve seen, something mathematically equivalent to A_p distributions are commonly used, but that’s not what they’re called.
Like, I think you might call the case in this problem “a Bernoulli random variable with an unknown parameter”. (The Bernoulli random variable being 1 if it gives you $2, 0 if it gives you $0). And then the hyperprior would be the probability distribution of that parameter, I guess? I haven’t really heard that word before.
ET Jaynes, of course, would never talk like this because the idea of a random quantity existing in the real world is a mind projection fallacy. Thus, no “random variables”. So he uses the A_p distribution as a way of thinking about the same math without the idea of randomness. Jaynes’s A_p in this case corresponds exactly to the more traditional “the parameter of the Bernoulli random variable is p”.
(btw I have a purely mathematical question about the A_p distribution chapter, which I posted to the open thread: http://lesswrong.com/lw/ii6/open_thread_september_28_2013/9pbn if you know the answer I’d really appreciate it if you told me)