I’m afraid I haven’t heard of any specific names for it, but it’s a probability distribution on probabilities. Let me step through my understanding of it in a bit more detail (which may not be what Jaynes was talking about, and if so this’ll hopefully make that obvious):
Instead of flipping biased coins, imagine having a randomly chosen double between 0 and 1 stored in memory, and labeled p. You then have a function flip, which generates a double chosen uniformly at random between 0 and 1, compares it to p, and returns 1 if it’s less than or equal to p and 0 if it’s greater than p.
Since p is hidden, you have to infer it from watching the results of the flip function. There are two things you could be uncertain about- what p is, and what flip will return next. The A_p distribution is your distribution over p, and the A distribution is your distribution over the results of flip; A_p is a function over doubles between 0 and 1, and A is a function over the integers 0 and 1.
And so once you see 10,000 flips, and 5,000 of them returned 1, you have a very narrow distribution on A_p, but your distribution on A, which is determined by summing over A_p, is 50% chance of it being 1, which is the least you could know about it. Even though you know very little about A, you know a lot about A_p; seeing another flip barely shifts your estimate for A_p (and thus A). Once you’ve calculated A_p from the evidence you’ve seen so far, you can forget that evidence- it doesn’t tell you anything you haven’t already incorporated into your model.
I’m afraid I haven’t heard of any specific names for it, but it’s a probability distribution on probabilities. Let me step through my understanding of it in a bit more detail (which may not be what Jaynes was talking about, and if so this’ll hopefully make that obvious):
Instead of flipping biased coins, imagine having a randomly chosen double between 0 and 1 stored in memory, and labeled p. You then have a function flip, which generates a double chosen uniformly at random between 0 and 1, compares it to p, and returns 1 if it’s less than or equal to p and 0 if it’s greater than p.
Since p is hidden, you have to infer it from watching the results of the flip function. There are two things you could be uncertain about- what p is, and what flip will return next. The A_p distribution is your distribution over p, and the A distribution is your distribution over the results of flip; A_p is a function over doubles between 0 and 1, and A is a function over the integers 0 and 1.
And so once you see 10,000 flips, and 5,000 of them returned 1, you have a very narrow distribution on A_p, but your distribution on A, which is determined by summing over A_p, is 50% chance of it being 1, which is the least you could know about it. Even though you know very little about A, you know a lot about A_p; seeing another flip barely shifts your estimate for A_p (and thus A). Once you’ve calculated A_p from the evidence you’ve seen so far, you can forget that evidence- it doesn’t tell you anything you haven’t already incorporated into your model.