Or am I missing some key factor here? Did I misinterpret the lesson?
The key factor is that the 60,20 box is not in isolation—it is the top box, and so not only do you expect it to have more “signal” (gold) than average, you also expect it to have more noise than average.
You can think of the numbers on the boxes as drawn from a probability distribution. If there was 0 noise, this probability distribution would just be how the gold in the boxes was distributed. But if you add noise, it’s like adding two probability distributions together. If you’re not familiar with what happens, go look it up on wikipedia, but the upshot is that the combined distribution is more spread out than the original. This combined distribution isn’t just noise or just signal, it’s the probability of having some number be written on the outside of the box.
And so if something is the top, very highest box, where should it be located on the combined distribution?
Now, if you have something that’s high on the combined distribution, how much of that is due to signal, and how much of it is due to noise? This is a tougher question, but the essential insight is that the noise shouldn’t be more improbable than the signal, or vice versa—that is, they should both be about the same number of standard deviations from their means.
This means that if the standard deviation of the noise is bigger, then the probable contribution of the noise is greater.
Me saying the same thing a different way can be found here.
Oh, I understand now. Even if we don’t know how it’s distributed, if it’s the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it’s likely to be in the 80th percentile of noise.
It might have been clearer if you’d instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50⁄1, since you’d have been holding yourself to more proper unbiased generation of the numbers and still, in all likelihood, come up with a highest-labeled box that contained less than the sure thing. You have to basically divide your distance from the norm by the ratio of specialness you expect to get from signal and noise. The “all 45” thing just makes it feel like a trick.
I think there’s some value in that observation that “the all 45 thing makes it feel like a trick”. I believe that’s a big part of why this feels like a paradox.
If you have a box with the numbers “60” and “20″ as described above, then I can see two main ways that you could interpret the numbers:
A: The number of coins in this box was drawn from a probability distribution with a mean of 60, and a range of 20.
B: The number of coins in this box was drawn from an unknown probability distribution. Our best estimate of the number of coins in this box is 60, based on certain information that we have available. We are certain that the actual value is within 20 gold coins of this.
With regards to understanding the example, and understanding how to apply the kind of Bayesian reasoning that the article recommends, it’s important to understand that the example was based on B. And in real life, B describes situations that we’re far more likely to encounter.
With regards to understanding human psychology, human biases, and why this feels like a paradox, it’s important to understand that we instinctively tend towards “A”. I don’t know if all humans would tend to think in terms of A rather than B, but I suspect the bias applies widely amongst people who’ve studied any kind of formal probability. “A” is much closer to the kind of questions that would be set as exercises in a probability class.
That’s true—when I wrote the post you replied to I still didn’t really understand the solution—though it did make a good example for JGWeissman’s question. By the time I wrote the post I linked to, I had figured it out and didn’t have to cheat.
The key factor is that the 60,20 box is not in isolation—it is the top box, and so not only do you expect it to have more “signal” (gold) than average, you also expect it to have more noise than average.
You can think of the numbers on the boxes as drawn from a probability distribution. If there was 0 noise, this probability distribution would just be how the gold in the boxes was distributed. But if you add noise, it’s like adding two probability distributions together. If you’re not familiar with what happens, go look it up on wikipedia, but the upshot is that the combined distribution is more spread out than the original. This combined distribution isn’t just noise or just signal, it’s the probability of having some number be written on the outside of the box.
And so if something is the top, very highest box, where should it be located on the combined distribution?
Now, if you have something that’s high on the combined distribution, how much of that is due to signal, and how much of it is due to noise? This is a tougher question, but the essential insight is that the noise shouldn’t be more improbable than the signal, or vice versa—that is, they should both be about the same number of standard deviations from their means.
This means that if the standard deviation of the noise is bigger, then the probable contribution of the noise is greater.
Me saying the same thing a different way can be found here.
Oh, I understand now. Even if we don’t know how it’s distributed, if it’s the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it’s likely to be in the 80th percentile of noise.
It might have been clearer if you’d instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50⁄1, since you’d have been holding yourself to more proper unbiased generation of the numbers and still, in all likelihood, come up with a highest-labeled box that contained less than the sure thing. You have to basically divide your distance from the norm by the ratio of specialness you expect to get from signal and noise. The “all 45” thing just makes it feel like a trick.
I think there’s some value in that observation that “the all 45 thing makes it feel like a trick”. I believe that’s a big part of why this feels like a paradox.
If you have a box with the numbers “60” and “20″ as described above, then I can see two main ways that you could interpret the numbers:
A: The number of coins in this box was drawn from a probability distribution with a mean of 60, and a range of 20.
B: The number of coins in this box was drawn from an unknown probability distribution. Our best estimate of the number of coins in this box is 60, based on certain information that we have available. We are certain that the actual value is within 20 gold coins of this.
With regards to understanding the example, and understanding how to apply the kind of Bayesian reasoning that the article recommends, it’s important to understand that the example was based on B. And in real life, B describes situations that we’re far more likely to encounter.
With regards to understanding human psychology, human biases, and why this feels like a paradox, it’s important to understand that we instinctively tend towards “A”. I don’t know if all humans would tend to think in terms of A rather than B, but I suspect the bias applies widely amongst people who’ve studied any kind of formal probability. “A” is much closer to the kind of questions that would be set as exercises in a probability class.
That’s true—when I wrote the post you replied to I still didn’t really understand the solution—though it did make a good example for JGWeissman’s question. By the time I wrote the post I linked to, I had figured it out and didn’t have to cheat.