I’m trying to figure out why, from the rules you gave at the start, we can assume that box 60 has more noise than the other boxes with variance of 20. You didn’t, at the outset of the problem, say anything about what the values in the boxes actually were. I would not, taking this experiment, have been surprised to see a box labeled “200”, with a variance of 20, because the rules didn’t say anything about values being close to 50, just close to A. Well, I would’ve been surprised with you as a test-giver, but it wouldn’t have violated what I understood the rules to be and I wouldn’t have any reason to doubt that box was the right choice.
The box with 60 stands out among the boxes with high variance, but you did not say that those boxes were generated with the same algorithm and thus have the same actual value. In fact you implied the opposite. You just told me that 60 was an estimate of its expected value, and 37 was an estimate of one of the other boxes’ expected values. So I would assign a very high probability to it being worth more than the box labeled 37. I understand that the variance is being effectively applied twice to go between the number on the box to the real number of coins (The real number of 45 could make an estimate anywhere from 25 to 65, but if it hit 25 I’d be assigning the real number a lower bound of 5 and if it hit 65 I’d be assigning the real number an upper bound of 85, which is twice that range). (Actually for that reason I’m not sure your algorithm really means there’s a variance of 20 from what you state the expected value to be, but I don’t feel like doing all the math to verify that since it’s tangential to the message I’m hearing from you or what I’m saying). But that doesn’t change the average. The range of values that my box labeled 60 could really contain from being higher than the range the box labeled 37 could really contain, to the best of my knowledge, and both are most likely to fall within a couple coins of the center of that range, with the highest probability concentrated on the exact number.
If the boxes really did contain different numbers of coins, or we just didn’t have reason to assume that they don’t contain different numbers, the box labeled 60 is likely to contain more coins than that 50⁄1 box did. It is also capable of undershooting 50 by ten times as much if unlucky, so if for some reason I absolutely cannot afford to find less than 50 coins in my box the 50⁄1 box is the safer choice—but if I bet on the 60⁄20 box 100 times and you bet on the 50⁄1 box 100 times, given the rules you set out in the beginning, I would walk away with 20% more money.
Or am I missing some key factor here? Did I misinterpret the lesson?
Or am I missing some key factor here? Did I misinterpret the lesson?
The key factor is that the 60,20 box is not in isolation—it is the top box, and so not only do you expect it to have more “signal” (gold) than average, you also expect it to have more noise than average.
You can think of the numbers on the boxes as drawn from a probability distribution. If there was 0 noise, this probability distribution would just be how the gold in the boxes was distributed. But if you add noise, it’s like adding two probability distributions together. If you’re not familiar with what happens, go look it up on wikipedia, but the upshot is that the combined distribution is more spread out than the original. This combined distribution isn’t just noise or just signal, it’s the probability of having some number be written on the outside of the box.
And so if something is the top, very highest box, where should it be located on the combined distribution?
Now, if you have something that’s high on the combined distribution, how much of that is due to signal, and how much of it is due to noise? This is a tougher question, but the essential insight is that the noise shouldn’t be more improbable than the signal, or vice versa—that is, they should both be about the same number of standard deviations from their means.
This means that if the standard deviation of the noise is bigger, then the probable contribution of the noise is greater.
Me saying the same thing a different way can be found here.
Oh, I understand now. Even if we don’t know how it’s distributed, if it’s the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it’s likely to be in the 80th percentile of noise.
It might have been clearer if you’d instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50⁄1, since you’d have been holding yourself to more proper unbiased generation of the numbers and still, in all likelihood, come up with a highest-labeled box that contained less than the sure thing. You have to basically divide your distance from the norm by the ratio of specialness you expect to get from signal and noise. The “all 45” thing just makes it feel like a trick.
I think there’s some value in that observation that “the all 45 thing makes it feel like a trick”. I believe that’s a big part of why this feels like a paradox.
If you have a box with the numbers “60” and “20″ as described above, then I can see two main ways that you could interpret the numbers:
A: The number of coins in this box was drawn from a probability distribution with a mean of 60, and a range of 20.
B: The number of coins in this box was drawn from an unknown probability distribution. Our best estimate of the number of coins in this box is 60, based on certain information that we have available. We are certain that the actual value is within 20 gold coins of this.
With regards to understanding the example, and understanding how to apply the kind of Bayesian reasoning that the article recommends, it’s important to understand that the example was based on B. And in real life, B describes situations that we’re far more likely to encounter.
With regards to understanding human psychology, human biases, and why this feels like a paradox, it’s important to understand that we instinctively tend towards “A”. I don’t know if all humans would tend to think in terms of A rather than B, but I suspect the bias applies widely amongst people who’ve studied any kind of formal probability. “A” is much closer to the kind of questions that would be set as exercises in a probability class.
That’s true—when I wrote the post you replied to I still didn’t really understand the solution—though it did make a good example for JGWeissman’s question. By the time I wrote the post I linked to, I had figured it out and didn’t have to cheat.
I’m trying to figure out why, from the rules you gave at the start, we can assume that box 60 has more noise than the other boxes with variance of 20. You didn’t, at the outset of the problem, say anything about what the values in the boxes actually were. I would not, taking this experiment, have been surprised to see a box labeled “200”, with a variance of 20, because the rules didn’t say anything about values being close to 50, just close to A. Well, I would’ve been surprised with you as a test-giver, but it wouldn’t have violated what I understood the rules to be and I wouldn’t have any reason to doubt that box was the right choice.
The box with 60 stands out among the boxes with high variance, but you did not say that those boxes were generated with the same algorithm and thus have the same actual value. In fact you implied the opposite. You just told me that 60 was an estimate of its expected value, and 37 was an estimate of one of the other boxes’ expected values. So I would assign a very high probability to it being worth more than the box labeled 37. I understand that the variance is being effectively applied twice to go between the number on the box to the real number of coins (The real number of 45 could make an estimate anywhere from 25 to 65, but if it hit 25 I’d be assigning the real number a lower bound of 5 and if it hit 65 I’d be assigning the real number an upper bound of 85, which is twice that range). (Actually for that reason I’m not sure your algorithm really means there’s a variance of 20 from what you state the expected value to be, but I don’t feel like doing all the math to verify that since it’s tangential to the message I’m hearing from you or what I’m saying). But that doesn’t change the average. The range of values that my box labeled 60 could really contain from being higher than the range the box labeled 37 could really contain, to the best of my knowledge, and both are most likely to fall within a couple coins of the center of that range, with the highest probability concentrated on the exact number.
If the boxes really did contain different numbers of coins, or we just didn’t have reason to assume that they don’t contain different numbers, the box labeled 60 is likely to contain more coins than that 50⁄1 box did. It is also capable of undershooting 50 by ten times as much if unlucky, so if for some reason I absolutely cannot afford to find less than 50 coins in my box the 50⁄1 box is the safer choice—but if I bet on the 60⁄20 box 100 times and you bet on the 50⁄1 box 100 times, given the rules you set out in the beginning, I would walk away with 20% more money.
Or am I missing some key factor here? Did I misinterpret the lesson?
The key factor is that the 60,20 box is not in isolation—it is the top box, and so not only do you expect it to have more “signal” (gold) than average, you also expect it to have more noise than average.
You can think of the numbers on the boxes as drawn from a probability distribution. If there was 0 noise, this probability distribution would just be how the gold in the boxes was distributed. But if you add noise, it’s like adding two probability distributions together. If you’re not familiar with what happens, go look it up on wikipedia, but the upshot is that the combined distribution is more spread out than the original. This combined distribution isn’t just noise or just signal, it’s the probability of having some number be written on the outside of the box.
And so if something is the top, very highest box, where should it be located on the combined distribution?
Now, if you have something that’s high on the combined distribution, how much of that is due to signal, and how much of it is due to noise? This is a tougher question, but the essential insight is that the noise shouldn’t be more improbable than the signal, or vice versa—that is, they should both be about the same number of standard deviations from their means.
This means that if the standard deviation of the noise is bigger, then the probable contribution of the noise is greater.
Me saying the same thing a different way can be found here.
Oh, I understand now. Even if we don’t know how it’s distributed, if it’s the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it’s likely to be in the 80th percentile of noise.
It might have been clearer if you’d instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50⁄1, since you’d have been holding yourself to more proper unbiased generation of the numbers and still, in all likelihood, come up with a highest-labeled box that contained less than the sure thing. You have to basically divide your distance from the norm by the ratio of specialness you expect to get from signal and noise. The “all 45” thing just makes it feel like a trick.
I think there’s some value in that observation that “the all 45 thing makes it feel like a trick”. I believe that’s a big part of why this feels like a paradox.
If you have a box with the numbers “60” and “20″ as described above, then I can see two main ways that you could interpret the numbers:
A: The number of coins in this box was drawn from a probability distribution with a mean of 60, and a range of 20.
B: The number of coins in this box was drawn from an unknown probability distribution. Our best estimate of the number of coins in this box is 60, based on certain information that we have available. We are certain that the actual value is within 20 gold coins of this.
With regards to understanding the example, and understanding how to apply the kind of Bayesian reasoning that the article recommends, it’s important to understand that the example was based on B. And in real life, B describes situations that we’re far more likely to encounter.
With regards to understanding human psychology, human biases, and why this feels like a paradox, it’s important to understand that we instinctively tend towards “A”. I don’t know if all humans would tend to think in terms of A rather than B, but I suspect the bias applies widely amongst people who’ve studied any kind of formal probability. “A” is much closer to the kind of questions that would be set as exercises in a probability class.
That’s true—when I wrote the post you replied to I still didn’t really understand the solution—though it did make a good example for JGWeissman’s question. By the time I wrote the post I linked to, I had figured it out and didn’t have to cheat.