The question is if we do not use bayesian reasoning, just use statistics analysis can we still get an unbiased estimation? The answer is of course yes. Using fair sample to estimate population is as standard as it gets. The main argument is of course what is the fair sample. Depending on the answer we get estimation of r=21 or 27 respectively.
SIA states we should treat beauty’s own room as a randomly selected from all rooms. By applying this idea in bayesian analysis is how we get thirdism. To oversimplify it: we shall reason as some selector randomly chose a day and find beauty awake, which in itself is a coincidence. However there is no reason for SIA to apply only to bayesian analysis but not statistical analysis. If we use SIA reasoning in statistical analysis, treating her own room as randomly selected from all 81 rooms, then the 9 rooms are all part of a simple random sample, which by definition is unbiased. There is no baye’s rule or conditioning involved because here we are not treating it as a probability problem. Beauty’s own red room is just a coincidence as in bayesian analysis, it suggest a larger number of reds the same way the other 2 red rooms does.
If one want to argue those 9 rooms are biased, why not use the same logic in a bayesian analysis? Borrowing cousin_it’s example. If there are 3 rooms with the number of red rooms uniformly distributed between 1 and 3. If beauty wakes up and open another door and sees another red what should her credence of R=3 be? If I’m not mistaken thirders will say 3⁄4. Because by randomly selecting 2 room out of 3 and both being red there are 3 ways for R=3 and 1 way for R=2. Here thirders are treating her own room the same way as the second room. And the two rooms are thought to be randomly selected aka unbiased. If one argues the 2 rooms are biased towards red because her own room is red, then the calculation above is no longer valid.
Even if one takes the unlikely position that SIA is only applicable in bayesian but not statistical analysis there are still strange consequences. I might be mistaken but in problems of simple sampling, in general, not considering some round off errors, the statistical estimation would also be the case with highest probability in a bayesian analysis with an uniform prior. By using SIA in a bayesian analysis, we get R=27 as the most likely case. However statistics gives an estimate of R=21. This difference cannot be easily explained.
To answer the last part of your statement. If beauty randomly opens 8 doors and found them all red then she has a sample of pure red. By simple statistics she should give R=81 as the estimation. Halfer and thirders would both agree on that. If they do a bayesian analysis R=81 would also be the case with the highest probability. I’m not sure where 75 comes from I’m assuming by summing the multiples of probability and Rs in the bayesian analysis? But that value does not correspond to the estimation in statistics. Imagine you randomly draw 20 beans from a bag and they are all red, using statistics obviously you are not going to estimate the bag contains 90% red bean.
The 8 rooms are definitely the unbiased sample (of your rooms with one red room subtracted).
I think you are making two mistakes:
First, I think you’re too focused on the nice properties of an unbiased sample. You can take an unbiased sample all you want, but if we know information in addition to the sample, our best estimate might not be the average of the sample! Suppose we have two urns, urn A has 10 red balls and 10 blue balls, while urn B has 5 red balls and 15 blue balls. We choose an urn by rolling a die, such that we have a 5⁄6 chance of choosing urn A and a 1⁄6 chance of choosing urn B. Then we take a fair, unbiased sample of 4 balls from whatever urn we chose. Suppose we draw out 1 red ball and 3 blue balls. Since this is an unbiased sample, does the process that you are calling “statistical analysis” have to estimate that we were drawing from urn B?
Second, you are trying too hard to make everything about the rooms. It’s like someone was doing the problem with two urns from the previous paragraph, but tried to mathematically arrive at the answer only as a function of the number of red balls drawn, without making any reference to the process that causes them to draw from urn A vs. urn B. And they come up with several different ideas about what the function could be, and they call those functions “the Two-Thirds-B-er method” and “the Four-Tenths-B-er method.” When really, both methods are incomplete because they fail to take into account what we know about how we picked the urn to draw from.
To answer the last part of your statement. If beauty randomly opens 8 doors and found them all red then she has a sample of pure red. By simple statistics she should give R=81 as the estimation. Halfer and thirders would both agree on that. If they do a bayesian analysis R=81 would also be the case with the highest probability. I’m not sure where 75 comes from I’m assuming by summing the multiples of probability and Rs in the bayesian analysis? But that value does not correspond to the estimation in statistics. Imagine you randomly draw 20 beans from a bag and they are all red, using statistics obviously you are not going to estimate the bag contains 90% red bean.
Think of it like this: if Beauty opens 8 doors and they’re all red, and then she goes to open a ninth door, how likely should she think it is to be red? 100%, or something smaller than 100%? For predictions, we use the average of a probability distribution, not just its highest point.
No problem, always good to have a discussion with someone serious about the subject matter.
First of all, you are right: statistic estimation and expected value in bayesian analysis are different. But that is not what I’m saying. What I’m saying is in a bayesian analysis with an uninformed prior (uniform) the case with highest probability should be the unbiased statistic estimation (it is not always so because round offs etc).
In the two urns example, I think what you meant is that using the sample of 4 balls a fair estimation would be 5 reds and 15 blues as in the case of B but bayesian analysis would give A as more likely? However this disagreement is due to the use of an informed prior, that you already know we are more likely to draw from A right from the beginning. Without knowing this bayesian would give B as the most likely case, same as statistic estimate.
Think of it like this: if Beauty opens 8 doors and they’re all red, and then she goes to open a ninth door, how likely should she think it is to be red? 100%, or something smaller than 100%? For predictions, we use the average of a probability distribution, not just its highest point.
Definitely something smaller than 100%. Just because beauty thinks r=81 is the most likely case doesn’t mean she think it is the only case. But that is not what the estimation is about. Maybe this question would be more relevant: If after opening 8 doors and they are all red and beauty have to guess R. what number should she guess (to be most likely correct)?
Very clear argument, thank you for the reply.
The question is if we do not use bayesian reasoning, just use statistics analysis can we still get an unbiased estimation? The answer is of course yes. Using fair sample to estimate population is as standard as it gets. The main argument is of course what is the fair sample. Depending on the answer we get estimation of r=21 or 27 respectively.
SIA states we should treat beauty’s own room as a randomly selected from all rooms. By applying this idea in bayesian analysis is how we get thirdism. To oversimplify it: we shall reason as some selector randomly chose a day and find beauty awake, which in itself is a coincidence. However there is no reason for SIA to apply only to bayesian analysis but not statistical analysis. If we use SIA reasoning in statistical analysis, treating her own room as randomly selected from all 81 rooms, then the 9 rooms are all part of a simple random sample, which by definition is unbiased. There is no baye’s rule or conditioning involved because here we are not treating it as a probability problem. Beauty’s own red room is just a coincidence as in bayesian analysis, it suggest a larger number of reds the same way the other 2 red rooms does.
If one want to argue those 9 rooms are biased, why not use the same logic in a bayesian analysis? Borrowing cousin_it’s example. If there are 3 rooms with the number of red rooms uniformly distributed between 1 and 3. If beauty wakes up and open another door and sees another red what should her credence of R=3 be? If I’m not mistaken thirders will say 3⁄4. Because by randomly selecting 2 room out of 3 and both being red there are 3 ways for R=3 and 1 way for R=2. Here thirders are treating her own room the same way as the second room. And the two rooms are thought to be randomly selected aka unbiased. If one argues the 2 rooms are biased towards red because her own room is red, then the calculation above is no longer valid.
Even if one takes the unlikely position that SIA is only applicable in bayesian but not statistical analysis there are still strange consequences. I might be mistaken but in problems of simple sampling, in general, not considering some round off errors, the statistical estimation would also be the case with highest probability in a bayesian analysis with an uniform prior. By using SIA in a bayesian analysis, we get R=27 as the most likely case. However statistics gives an estimate of R=21. This difference cannot be easily explained.
To answer the last part of your statement. If beauty randomly opens 8 doors and found them all red then she has a sample of pure red. By simple statistics she should give R=81 as the estimation. Halfer and thirders would both agree on that. If they do a bayesian analysis R=81 would also be the case with the highest probability. I’m not sure where 75 comes from I’m assuming by summing the multiples of probability and Rs in the bayesian analysis? But that value does not correspond to the estimation in statistics. Imagine you randomly draw 20 beans from a bag and they are all red, using statistics obviously you are not going to estimate the bag contains 90% red bean.
Sorry for the slow reply.
The 8 rooms are definitely the unbiased sample (of your rooms with one red room subtracted).
I think you are making two mistakes:
First, I think you’re too focused on the nice properties of an unbiased sample. You can take an unbiased sample all you want, but if we know information in addition to the sample, our best estimate might not be the average of the sample! Suppose we have two urns, urn A has 10 red balls and 10 blue balls, while urn B has 5 red balls and 15 blue balls. We choose an urn by rolling a die, such that we have a 5⁄6 chance of choosing urn A and a 1⁄6 chance of choosing urn B. Then we take a fair, unbiased sample of 4 balls from whatever urn we chose. Suppose we draw out 1 red ball and 3 blue balls. Since this is an unbiased sample, does the process that you are calling “statistical analysis” have to estimate that we were drawing from urn B?
Second, you are trying too hard to make everything about the rooms. It’s like someone was doing the problem with two urns from the previous paragraph, but tried to mathematically arrive at the answer only as a function of the number of red balls drawn, without making any reference to the process that causes them to draw from urn A vs. urn B. And they come up with several different ideas about what the function could be, and they call those functions “the Two-Thirds-B-er method” and “the Four-Tenths-B-er method.” When really, both methods are incomplete because they fail to take into account what we know about how we picked the urn to draw from.
Think of it like this: if Beauty opens 8 doors and they’re all red, and then she goes to open a ninth door, how likely should she think it is to be red? 100%, or something smaller than 100%? For predictions, we use the average of a probability distribution, not just its highest point.
No problem, always good to have a discussion with someone serious about the subject matter.
First of all, you are right: statistic estimation and expected value in bayesian analysis are different. But that is not what I’m saying. What I’m saying is in a bayesian analysis with an uninformed prior (uniform) the case with highest probability should be the unbiased statistic estimation (it is not always so because round offs etc).
In the two urns example, I think what you meant is that using the sample of 4 balls a fair estimation would be 5 reds and 15 blues as in the case of B but bayesian analysis would give A as more likely? However this disagreement is due to the use of an informed prior, that you already know we are more likely to draw from A right from the beginning. Without knowing this bayesian would give B as the most likely case, same as statistic estimate.
Definitely something smaller than 100%. Just because beauty thinks r=81 is the most likely case doesn’t mean she think it is the only case. But that is not what the estimation is about. Maybe this question would be more relevant: If after opening 8 doors and they are all red and beauty have to guess R. what number should she guess (to be most likely correct)?