I think this post could have been more formally worded. It draws a distinction between two types of probability assignment, but the only practical difference given is that you’d be surprised if you’re wrong in one case but not the other. My initial thought was just that surprise is an irrational thing that should be disregarded ― there’s no term for “how surprised I was” in Bayes’ Theorem.
But let’s rephrase the problem a bit. You’ve made your probability assignments based on Omega’s question: say 1⁄12 for each color. Now consider another situation where you’d give an identical probability assignment. Say I’m going to roll a demonstrated-fair twelve-sided die, and ask you the probability that it lands on one. Again, you assign 1⁄12 probability to each possibility.
(Actually, these assignments are spectacularly wrong, since they give a zero probability to all other colors/numbers. Nothing deserves a zero probability. But let’s assume you gave a negligible but nonzero probability to everything else, and 1⁄12 is just shorthand for “slightly less than 1⁄12, but not enough to bother specifying”.)
So as far as everything goes, your probability assignments for the two cases look identical up to this point. Now let’s say I offer you a bet: we’ll go through both events (drawing a bead and putting it back, or rolling the die) a million times. If your estimate of the probability of red/one was within 1% of correct in that sample, I give you $1000. Otherwise, you give me $1000.
In the case of the die, we would all take the bet in a heartbeat. We’re very sure that our figures are correct, since the die is demonstrated to be fair, and 1% is a lot of wiggle room for the law of large numbers. But you’d have to be crazy to take the same bet on the jar, despite having assigned a precisely identical chance of winning.
So what’s the difference? Isn’t all the information you care about supposed to be encapsulated in your probability distribution? What is the mathematical distinction between these two cases that causes such a clear difference in whether a given bet is rational? Are we supposed to not only assign probabilities to which events will occur, but also to our probabilities themselves, ad infinitum?
there’s no term for “how surprised I was” in Bayes’ Theorem.
Not quite. The intuitive notion of “how surprised you were” maps closely to bayesian likelihood ratios.
Regarding your die/beads scenarios:
In your die scenario, you have one highly favored model that assigns equal probability to each possible number. In the beads scenario you have many possible models, all with low probability; averaging their predictions gives equal probability to each possible color.
To simplify things, let’s say our only models are M, which predicts the outcomes are random and equally likely (i.e. a fair die or jar filled with an even ratio of 12 colors of beads), and not-M (i.e. a weighted die or jar filled with all the same color beads). In the beads scenario we might guess that P(M)=.1; in the die scenario P(M)=.99. In both cases, our probability of red/one is 1⁄12, because neither of our models tell us which color/number to expect. But our probability of winning the bet is different—we only win if M is correct.
That clears things up a lot. I hadn’t really thought about the multiple-models take on it (despite having read the “prior probabilities as mathematical objects” post). Thanks.
Isn’t all the information you care about supposed to be encapsulated in your probability distribution?
No. As another (yours is one) simple counterexample, if I flip a fair coin 100 times you expect around 50 heads, but if I either choose a double-head or double-tail coin and flip that 100 times, you expect either 100 heads or 100 tails—and yet the probability of the first flip is still 50⁄50.
A distribution over models solves this problem. IIRC you don’t have to regress further, but I don’t remember where (or even if) I saw that result.
but if I either choose a double-head or double-tail coin and flip that 100 times,
To clarify: if you know Guy chose either a double-head or double-tail coin, but you have no idea which, then you should assign 50% to heads on the first flip, then either 0% or 100% to heads after, since you’ll the know which one it was.
It’s been linked too often already in this thread, but the example in Priors as Mathematical Objects neatly demonstrates how a prior is more than just a probability distribution, and how Simetrical’s question doesn’t lead to paradox.
(Actually, these assignments are spectacularly wrong, since they give a zero probability to all other colors/numbers. Nothing deserves a zero probability. But let’s assume you gave a negligible but nonzero probability to everything else, and 1⁄12 is just shorthand for “slightly less than 1⁄12, but not enough to bother specifying”.)
The justification given in the original post was spectacularly wrong. The assignments themselves may not be. One could just as easily be using the shorthand for “slightly more than 1⁄12 because I now know that red is a color Omega considers ‘color-worthy’, he can see that I’ve got red receptive cones in my eyes and this influences my probability a little more than the possibility that he has obscure color beads. And screw it. Lilac is freaking purple anyway. And he asked for my probability, not that of some pedantic ponce!”
I think this post could have been more formally worded. It draws a distinction between two types of probability assignment, but the only practical difference given is that you’d be surprised if you’re wrong in one case but not the other. My initial thought was just that surprise is an irrational thing that should be disregarded ― there’s no term for “how surprised I was” in Bayes’ Theorem.
But let’s rephrase the problem a bit. You’ve made your probability assignments based on Omega’s question: say 1⁄12 for each color. Now consider another situation where you’d give an identical probability assignment. Say I’m going to roll a demonstrated-fair twelve-sided die, and ask you the probability that it lands on one. Again, you assign 1⁄12 probability to each possibility.
(Actually, these assignments are spectacularly wrong, since they give a zero probability to all other colors/numbers. Nothing deserves a zero probability. But let’s assume you gave a negligible but nonzero probability to everything else, and 1⁄12 is just shorthand for “slightly less than 1⁄12, but not enough to bother specifying”.)
So as far as everything goes, your probability assignments for the two cases look identical up to this point. Now let’s say I offer you a bet: we’ll go through both events (drawing a bead and putting it back, or rolling the die) a million times. If your estimate of the probability of red/one was within 1% of correct in that sample, I give you $1000. Otherwise, you give me $1000.
In the case of the die, we would all take the bet in a heartbeat. We’re very sure that our figures are correct, since the die is demonstrated to be fair, and 1% is a lot of wiggle room for the law of large numbers. But you’d have to be crazy to take the same bet on the jar, despite having assigned a precisely identical chance of winning.
So what’s the difference? Isn’t all the information you care about supposed to be encapsulated in your probability distribution? What is the mathematical distinction between these two cases that causes such a clear difference in whether a given bet is rational? Are we supposed to not only assign probabilities to which events will occur, but also to our probabilities themselves, ad infinitum?
Not quite. The intuitive notion of “how surprised you were” maps closely to bayesian likelihood ratios.
Regarding your die/beads scenarios:
In your die scenario, you have one highly favored model that assigns equal probability to each possible number. In the beads scenario you have many possible models, all with low probability; averaging their predictions gives equal probability to each possible color.
To simplify things, let’s say our only models are M, which predicts the outcomes are random and equally likely (i.e. a fair die or jar filled with an even ratio of 12 colors of beads), and not-M (i.e. a weighted die or jar filled with all the same color beads). In the beads scenario we might guess that P(M)=.1; in the die scenario P(M)=.99. In both cases, our probability of red/one is 1⁄12, because neither of our models tell us which color/number to expect. But our probability of winning the bet is different—we only win if M is correct.
That clears things up a lot. I hadn’t really thought about the multiple-models take on it (despite having read the “prior probabilities as mathematical objects” post). Thanks.
No. As another (yours is one) simple counterexample, if I flip a fair coin 100 times you expect around 50 heads, but if I either choose a double-head or double-tail coin and flip that 100 times, you expect either 100 heads or 100 tails—and yet the probability of the first flip is still 50⁄50.
A distribution over models solves this problem. IIRC you don’t have to regress further, but I don’t remember where (or even if) I saw that result.
To clarify: if you know Guy chose either a double-head or double-tail coin, but you have no idea which, then you should assign 50% to heads on the first flip, then either 0% or 100% to heads after, since you’ll the know which one it was.
It’s been linked too often already in this thread, but the example in Priors as Mathematical Objects neatly demonstrates how a prior is more than just a probability distribution, and how Simetrical’s question doesn’t lead to paradox.
The justification given in the original post was spectacularly wrong. The assignments themselves may not be. One could just as easily be using the shorthand for “slightly more than 1⁄12 because I now know that red is a color Omega considers ‘color-worthy’, he can see that I’ve got red receptive cones in my eyes and this influences my probability a little more than the possibility that he has obscure color beads. And screw it. Lilac is freaking purple anyway. And he asked for my probability, not that of some pedantic ponce!”