I like the idea of having pictures but I do not like the idea of procuring pictures. I’ll make it a higher priority for future posts, though, and if someone wants to send me pictures (which I can legally use) for this post I’ll be happy to edit them in.
I replaced the “x”s with “p”s; hopefully that’ll make it a bit clearer.
We start off with a prior P(p)=1. That is, I think every p is equally likely, and when I integrate over the domain of p (from 0 to 1) I get 1, like I should.
Then I update on seeing heads. For each p value, the chance I saw heads was p- and so I expect my function to have the functional form P(p)=p. Notice that after seeing heads I think the mode is a coin that always lands on heads and that it’s impossible that the coin always lands on tails- both are what I expect. When I integrate p from 0 to 1, though, I get 1⁄2. I need to multiply it by 2 to normalize it, and so we have P(p)=2p.
This might look odd at first because it sounds like the probability of the coin always landing on heads is 2, which suggests an ill-formed probability. That’s the probability density, though- right now, my prior puts 0 probability on the coin always landing on heads, because that’s an integral with 0 width.
The 2-2x comes from the same argument, but the form is now 1-x.
I’m not sure it isn’t clearer with ’x’s, given that you have two different kinds of probabilities to confuse.
It may just be that there’s a fair bit of inferential distance to clear, though in presenting this notation at all.
I have a strong (if rusty) math background, but I had to reason through exactly what you could possibly mean down a couple different trees (one of which had a whole comment partially written asking you to explain certain things about your notation and meaning) before it finally clicked for me on a second reading of your comment here after trying to explain my confusion in formal mathematical terms.
I think a footnote about what probability distribution functions look like and what the values actually represent (densities, rather than probabilities), and a bit of work with them would be helpful. Or perhaps there’s enough inferential work there to be worth a whole post.
Great post on a very important topic.
One suggestion: pictures would help a lot here. Norvig’s AIMA has some very nice illustrations in ch 16 (I think)
Not sure how you worked this out. Not clear what X is
Figure 16.8. (I happened to have the book right next to me.)
I like the idea of having pictures but I do not like the idea of procuring pictures. I’ll make it a higher priority for future posts, though, and if someone wants to send me pictures (which I can legally use) for this post I’ll be happy to edit them in.
I replaced the “x”s with “p”s; hopefully that’ll make it a bit clearer.
We start off with a prior P(p)=1. That is, I think every p is equally likely, and when I integrate over the domain of p (from 0 to 1) I get 1, like I should.
Then I update on seeing heads. For each p value, the chance I saw heads was p- and so I expect my function to have the functional form P(p)=p. Notice that after seeing heads I think the mode is a coin that always lands on heads and that it’s impossible that the coin always lands on tails- both are what I expect. When I integrate p from 0 to 1, though, I get 1⁄2. I need to multiply it by 2 to normalize it, and so we have P(p)=2p.
This might look odd at first because it sounds like the probability of the coin always landing on heads is 2, which suggests an ill-formed probability. That’s the probability density, though- right now, my prior puts 0 probability on the coin always landing on heads, because that’s an integral with 0 width.
The 2-2x comes from the same argument, but the form is now 1-x.
I’m not sure it isn’t clearer with ’x’s, given that you have two different kinds of probabilities to confuse.
It may just be that there’s a fair bit of inferential distance to clear, though in presenting this notation at all.
I have a strong (if rusty) math background, but I had to reason through exactly what you could possibly mean down a couple different trees (one of which had a whole comment partially written asking you to explain certain things about your notation and meaning) before it finally clicked for me on a second reading of your comment here after trying to explain my confusion in formal mathematical terms.
I think a footnote about what probability distribution functions look like and what the values actually represent (densities, rather than probabilities), and a bit of work with them would be helpful. Or perhaps there’s enough inferential work there to be worth a whole post.
I definitely think that should be a post of its own.
Thanks for the feedback! It’s helpful when planning out a sequence to know where I should focus extra attention.
Just to be clear I was not suggesting ripping off their illustration, but it is a very good one worthy of legal emulation :).
The reason I put that is because I find at least half of the ugh in finding pictures is checking legality.