The A_p distribution seems really, really important, I don’t really feel like I completely understand it, and Jaynes is the only source I’ve heard even talk about it. Do you happen to know if it’s discussed in the wider literature under a different name or something?
To my knowledge, it’s not discussed explicitly in the wider literature. I’m not a statistician by training though, so my knowledge of the literature is not brilliant.
On the other hand, talking to working Bayesian statisticians about “what do you do if we don’t know what the model should be” seems to reliably return answers of broad form “throw that uncertainty into a two-level model, run the update, and let the data tell you which model is correct”. Which is the less formal version of what Jaynes is doing here.
This seems to be a reasonable discussion of the same basic material, though in a setting of finitely many models rather than the continuum of p models for Jaynes.
I’m afraid I haven’t heard of any specific names for it, but it’s a probability distribution on probabilities. Let me step through my understanding of it in a bit more detail (which may not be what Jaynes was talking about, and if so this’ll hopefully make that obvious):
Instead of flipping biased coins, imagine having a randomly chosen double between 0 and 1 stored in memory, and labeled p. You then have a function flip, which generates a double chosen uniformly at random between 0 and 1, compares it to p, and returns 1 if it’s less than or equal to p and 0 if it’s greater than p.
Since p is hidden, you have to infer it from watching the results of the flip function. There are two things you could be uncertain about- what p is, and what flip will return next. The A_p distribution is your distribution over p, and the A distribution is your distribution over the results of flip; A_p is a function over doubles between 0 and 1, and A is a function over the integers 0 and 1.
And so once you see 10,000 flips, and 5,000 of them returned 1, you have a very narrow distribution on A_p, but your distribution on A, which is determined by summing over A_p, is 50% chance of it being 1, which is the least you could know about it. Even though you know very little about A, you know a lot about A_p; seeing another flip barely shifts your estimate for A_p (and thus A). Once you’ve calculated A_p from the evidence you’ve seen so far, you can forget that evidence- it doesn’t tell you anything you haven’t already incorporated into your model.
The A_p distribution seems really, really important, I don’t really feel like I completely understand it, and Jaynes is the only source I’ve heard even talk about it. Do you happen to know if it’s discussed in the wider literature under a different name or something?
To my knowledge, it’s not discussed explicitly in the wider literature. I’m not a statistician by training though, so my knowledge of the literature is not brilliant.
On the other hand, talking to working Bayesian statisticians about “what do you do if we don’t know what the model should be” seems to reliably return answers of broad form “throw that uncertainty into a two-level model, run the update, and let the data tell you which model is correct”. Which is the less formal version of what Jaynes is doing here.
This seems to be a reasonable discussion of the same basic material, though in a setting of finitely many models rather than the continuum of p models for Jaynes.
I’m afraid I haven’t heard of any specific names for it, but it’s a probability distribution on probabilities. Let me step through my understanding of it in a bit more detail (which may not be what Jaynes was talking about, and if so this’ll hopefully make that obvious):
Instead of flipping biased coins, imagine having a randomly chosen double between 0 and 1 stored in memory, and labeled p. You then have a function flip, which generates a double chosen uniformly at random between 0 and 1, compares it to p, and returns 1 if it’s less than or equal to p and 0 if it’s greater than p.
Since p is hidden, you have to infer it from watching the results of the flip function. There are two things you could be uncertain about- what p is, and what flip will return next. The A_p distribution is your distribution over p, and the A distribution is your distribution over the results of flip; A_p is a function over doubles between 0 and 1, and A is a function over the integers 0 and 1.
And so once you see 10,000 flips, and 5,000 of them returned 1, you have a very narrow distribution on A_p, but your distribution on A, which is determined by summing over A_p, is 50% chance of it being 1, which is the least you could know about it. Even though you know very little about A, you know a lot about A_p; seeing another flip barely shifts your estimate for A_p (and thus A). Once you’ve calculated A_p from the evidence you’ve seen so far, you can forget that evidence- it doesn’t tell you anything you haven’t already incorporated into your model.