Maybe it is my English. In this case, you wake up in a red room, and open another room and found it to be blue. As SIA states, you should treat both rooms as they are randomly selected from all rooms. So in the 2 randomly selected rooms 1 is red and 1 is blue. Hence 50%.
It seems like you’re changing the definition of “fraction in the hand” to also include the room you woke up in, but keep the definition of “fraction in the bag” without that room. So now the “hand” contains a bean that didn’t come from the “bag”. That ain’t gonna work.
Maybe let’s stick to your old definitions:
The “beans in the hand” would be the random other room you open. The “beans in the bag” would be the two other rooms.
You said there was a difference between “fraction in the hand” and “fraction in the bag”, which was predictable before you grab. But to a thirder, before you grab, the expected values of both fractions are 2⁄3. Can you explain what difference you saw?
Ok, let’s slow down. First of all there are two type of analysis going on. One is bayesian analysis which you are focusing on. The other is simple statistics, which I am saying thirders and SIA are having troubles with.
If there are 100 rooms either red or blue. You randomly open 10 of them and saw 8 red and 2 blue. Here you can start a bayesian analysis (with an uniform prior obviously) and construct the pdf. I’m going to skip the calculation and just want to point out R=80 would have the highest probability. Now instead of going the bayesian way you can also just use simple statistics. You have a simple random sample of size 10 with 8 reds. So the unbiased estimation should be 8/10x100=80. You have applied two different ways of reasoning but got the same result, unsurprising since you used uniformed prior in bayesian analysis. So far I hope everything is clear.
Now let’s consider SIA. It tells you how to interpret the fact your own room is red. It says you should treat your own room as randomly selected from all rooms and it happens to be red, which is new information. Now if you open another room, then both rooms are randomly selected from all rooms. Thirders bayesian reasoning is consistent with this idea as shown by the calculation in my last reply.
Now apply SIA to statistics. Because it treat both rooms as randomly selected it is a simple random sample, which is a unbiased sample. I am not supporting that, all I’m saying is that’s what SIA suggests. The population to this sample, is all the rooms (including your own). Using statistics you can give an estimation of R the same way we gave the estimation of 80 before. Let’s call it E1. If thirder think SIA is valid they should stand by this estimation.
But you know you randomly selected a room (from the other 2 rooms). Which is a simple random sample of the other 2 rooms. If it helps, the room(s) you randomly selected are “the beans in hand”, all other rooms are “beans in bags”. Surely you should expect their fraction of red to be about equal right? Well, as I have calculated in my last reply. If you stand by the above estimation E1, then you would always conclude the rest of the rooms have a higher fraction of red, unless all the room you randomly opened are red of course. Basically you are already concluding the sample is biased towards blue before the selection is made. Or if you prefer, before you grab you already know you are going to say it has lower fraction of red than the bag does.
In essence you cannot take a unbiased sample and divided it into two parts, claiming one part is biased towards red while the other part is unbiased. The other part must be biased towards opposite direction aka blue.
I hope you now see that the probability of 2⁄3 you calculated is not relevant. It is a probability calculated using bayesian analysis. Not a sample’s property or the sample’s “fraction” used in statistics. For what it is worth, yes I agree with your calculation. It is the correct number a thirder would give.
If Bayes + SIA gives a consistent answer, while “simple statistics” + SIA gives a contradiction, it looks like “simple statistics” is at fault, not SIA.
First of all, SIA in bayesian is up to debate. That’s the whole point of halfer/thirder disagreement. A “consistent” reasoning is not necessarily correct. Halfers are also consistent.
Second of all, the statistics involved is as basic as it gets. You are saying with a simple random sample of 9 rooms with 3 reds, it is wrong to estimate the population have 30% reds. Yet no argument is given.
Also please take no offence, but I am not going to continue this discussion we are having. All I have been doing is explaining the same points again and again. While the replies I got are short and effortless. I feel this is no longer productive.
My replies to you are short, but they weren’t simple to write. Each of them took at least 30 minutes of work, condensing the issues in the most clear way. Apologies if that didn’t come across. Maybe a longer explanation would help? Here goes:
In the latest reply I tried to hint that many people use “simple statistics” in a way that disagrees with Bayes, and usually they turn out to be wrong in the end. One example is the boy or girl puzzle, which Eliezer mentioned here. Monty Hall variations are another well known example, they lead to many plausible-sounding frequentist intuitions, which are wrong while Bayes is reliably right. After you’ve faced enough such puzzles, you learn how to respond. Someone tells me, hey, look at this frequentist argument, it gives a weird result! And I reply, sorry, but if you can’t capture the weirdness in a Bayesian way, then no sale. If your ad hoc tools are correct, they should translate to the Bayes language easily. If translating is harder than you thought, you should get worried, not confident.
To put it another way, you’ve been talking about supernatural predictive power. But if it looks supernatural only to non-Bayesians, while Bayesians see nothing wrong, it must be very supernatural indeed! The best way to make sure it’s not an illusion is to try explaining the supernaturalness to a Bayesian. That’s what I’ve been asking you to do.
In both boy or girl puzzle and Monty hall problem the main point is “how” the new information is obtained. Is the mathematician randomly picking a child and mentioning its gender, or is he purposely checking for a boy among his children. Does the host know what’s behind the door and always reveal a goat, or does he simple randomly opens a door and it turns out to be a goat. Or in statistic terms: how is the sample drawn. Once that is clear bayesian and statistics gives the same result. Of course if one start from a wrong assumption about the sampling process his conclusion would be wrong. No argument there.
But SIA itself is a statement regarding how the sample is drawn. Why we must only check its merit with bayesian but not stats? And if you are certain the statistic reasoning is wrong then instead of pointing to different probability puzzles why not point out the mistake?
With all these posts you haven’t even mention whether you believe the thirder should estimate R=27 or not. While I have been explicitly clear about my positions and dissecting my arguments step by step I feel you are being very vague about yours. This put me into a harder and more labours position to counter argue. That’s why I feel this discussion is no longer about sleeping beauty problem but more about who’s right and who’s better at arguing. That’s not productive, and I am leaving it.
With all these posts you haven’t even mention whether you believe the thirder should estimate R=27 or not.
If by “estimate” you mean “highest credence”, the short answer is that Bayesians usually don’t use such tools (maximum likelihood, unbiased estimates, etc.) They use plain old expected values instead.
After waking up in a red room and then opening 2 red and 6 blue rooms, a Bayesian thirder will believe the expected value of R to be 321⁄11, which is a bit over 29. I calculated it directly and then checked with a numerical simulation.
It’s easy to explain why the expected value isn’t 27 (proportional to the fraction of red in the sample). Consider the case where all 9 rooms seen are red. Should a Bayesian then believe that the expected value of R is 81? No way! That would imply believing R=81 with probability 100%, because any nonzero credence for R<81 would lead to lower expected value. That’s way overconfident after seeing only 9 rooms, so the right expected value must be lower. You can try calculating it, it’s a nice exercise.
Appreciate the effort. Especially about the calculation part. I am no expert on coding. But from my limited knowledge on python the calculation looks correct to me. I want to point out for the direct calculation formulation like this+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)+%2F+(sum+(+((r)+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)) gives the same answer. I would say it reflect SIA reasoning more and resemble your code better as well. Basically it shows under SIA beauty should treat her own room the same way as the other 8 rooms.
The part explaining the relationship between expected value and unbiased estimation (maximum likelihood) is obviously correct. Though I wouldn’t say it is relevant to the argument.
You claim Bayesian’s don’t usually uses maximum likelihood or unbiased estimates. I would say that is a mistake. They are important in decision making. However “usually” is a subjective term and argument about how often is “usual” is pointless. The bottom line is they are valid questions to ask and bayesians should have an answer. And how should thirders answer it, that is the question.
Mathematically, maximum likelihood and unbiased estimates are well defined, but Bayesians don’t expect them to always agree with intuition.
For example, imagine you have a coin whose parameter is known to be between 1⁄3 and 2⁄3. After seeing one tails, an unbiased estimate of the coin’s parameter is 0 (lower than all possible parameter values) and the maximum likelihood estimate is 1⁄3 (jumping to extremes after seeing a tiny bit of information). Bayesian expected values don’t have such problems.
You can stop kicking the sand castle of frequentism+SIA, it never had strong defenders anyway. Bayes+SIA is the strong inconvenient position you should engage with.
Maximum likelihood is indeed 0 or Tails, assuming we start from a uniform prior. 1⁄3 is the expected value. Ask yourself this, after seeing a tail what should you guess for the next toss result to have maximum likelihood of being correct?
If halfers reasoning applies to both Bayesian and Frequentist while SIA is only good in Bayesian isn’t it quite alarming to say the least?
The 0 isn’t a prediction of the next coin toss, it’s an unbiased estimate of the coin parameter which is guaranteed to lie between 1⁄3 and 2⁄3. That’s the problem! Depending on the randomness in the sample, an unbiased estimate of unknown parameter X could be smaller or larger than literally all possible values of X. Since in the post you use unbiased estimates and expect them to behave reasonably, I thought this example would be relevant.
Hopefully that makes it clearer why Bayesians wouldn’t agree that frequentism+halfism is coherent. They think frequentism is incoherent enough on its own :-)
OK, I misunderstood. I interpreted the coin is biased 1⁄3 to 2⁄3 but we don’t know which side it favours. If we start from uniform (1/2 to H and 1⁄2 to T), then the maximum likelihood is Tails.
Unless I misunderstood again, you mean there is a coin we want to guess its natural chance (forgive me if I’m misusing terms here). We do know its chance is bounded between 1⁄3 and 2⁄3. In this case yes, the statistical estimate is 0 while the maximum likelihood is 1⁄3. However it is obviously due to the use of a informed prior (that we know it is between 1⁄3 and 2⁄3). Hardly a surprise.
Also I want to point out in your previous example you said SIA+frequentist never had any strong defenders. That is not true. Until now in literatures thirding are generally considered to be a better fit for frequentist than halving. Because long run frequency of Tail awakening is twice as many as Head awakenings. Such arguments are used by published academics including Elga. Therefore I would consider my attack from the frequentist angle has some value.
Interesting. I guess the right question is, if you insist on a frequentist argument, how simple can you make it? Like I said, I don’t expect things like unbiased estimates to behave intuitively. Can you make the argument about long run frequencies only? That would go a long way in convincing me that you found a genuine contradiction.
Yes, I have given a long run frequency argument for halving in part I. Sadly that part have not gotten any attention. My entire argument is about the importance of perspective disagreement in SBP. This counter argument is actually the less important part.
Sorry slightly confused here, bias (although an F concept, since it relies on “true parameter value”) is sort of orthogonal to B vs F.
Estimates based on either B or F techniques could be biased or unbiased.
Quoth famous Bayesian Andrew Gelman:
“I can’t keep track of what all those Bayesians are doing nowadays—unfortunately,
all sorts of people are being seduced by the promises of automatic inference through
the “magic of MCMC”—but I wish they would all just stop already and get back to
doing statistics the way it should be done, back in the old days when a p-value stood
for something, when a confidence interval meant what it said, and statistical bias was
something to eliminate, not something to embrace.”
Heh. I’m not a strong advocate of Bayesianism, but when someone says their estimator is unbiased, that doesn’t fill me with trust. There are many problems where the unique unbiased estimator is ridiculous (e.g. negative with high probability when the true parameter is always positive, etc.)
Lets say you wake up in room 1 which is red, then open room 2 which is blue, and room 3 stays unopened. Are you using {1,2} as a random sample that predicts the frequency of red in {2,3}? How on Earth is that reasonable?
Maybe it is my English. In this case, you wake up in a red room, and open another room and found it to be blue. As SIA states, you should treat both rooms as they are randomly selected from all rooms. So in the 2 randomly selected rooms 1 is red and 1 is blue. Hence 50%.
It seems like you’re changing the definition of “fraction in the hand” to also include the room you woke up in, but keep the definition of “fraction in the bag” without that room. So now the “hand” contains a bean that didn’t come from the “bag”. That ain’t gonna work.
Maybe let’s stick to your old definitions:
You said there was a difference between “fraction in the hand” and “fraction in the bag”, which was predictable before you grab. But to a thirder, before you grab, the expected values of both fractions are 2⁄3. Can you explain what difference you saw?
Ok, let’s slow down. First of all there are two type of analysis going on. One is bayesian analysis which you are focusing on. The other is simple statistics, which I am saying thirders and SIA are having troubles with.
If there are 100 rooms either red or blue. You randomly open 10 of them and saw 8 red and 2 blue. Here you can start a bayesian analysis (with an uniform prior obviously) and construct the pdf. I’m going to skip the calculation and just want to point out R=80 would have the highest probability. Now instead of going the bayesian way you can also just use simple statistics. You have a simple random sample of size 10 with 8 reds. So the unbiased estimation should be 8/10x100=80. You have applied two different ways of reasoning but got the same result, unsurprising since you used uniformed prior in bayesian analysis. So far I hope everything is clear.
Now let’s consider SIA. It tells you how to interpret the fact your own room is red. It says you should treat your own room as randomly selected from all rooms and it happens to be red, which is new information. Now if you open another room, then both rooms are randomly selected from all rooms. Thirders bayesian reasoning is consistent with this idea as shown by the calculation in my last reply.
Now apply SIA to statistics. Because it treat both rooms as randomly selected it is a simple random sample, which is a unbiased sample. I am not supporting that, all I’m saying is that’s what SIA suggests. The population to this sample, is all the rooms (including your own). Using statistics you can give an estimation of R the same way we gave the estimation of 80 before. Let’s call it E1. If thirder think SIA is valid they should stand by this estimation.
But you know you randomly selected a room (from the other 2 rooms). Which is a simple random sample of the other 2 rooms. If it helps, the room(s) you randomly selected are “the beans in hand”, all other rooms are “beans in bags”. Surely you should expect their fraction of red to be about equal right? Well, as I have calculated in my last reply. If you stand by the above estimation E1, then you would always conclude the rest of the rooms have a higher fraction of red, unless all the room you randomly opened are red of course. Basically you are already concluding the sample is biased towards blue before the selection is made. Or if you prefer, before you grab you already know you are going to say it has lower fraction of red than the bag does.
In essence you cannot take a unbiased sample and divided it into two parts, claiming one part is biased towards red while the other part is unbiased. The other part must be biased towards opposite direction aka blue.
I hope you now see that the probability of 2⁄3 you calculated is not relevant. It is a probability calculated using bayesian analysis. Not a sample’s property or the sample’s “fraction” used in statistics. For what it is worth, yes I agree with your calculation. It is the correct number a thirder would give.
If Bayes + SIA gives a consistent answer, while “simple statistics” + SIA gives a contradiction, it looks like “simple statistics” is at fault, not SIA.
Both claims are very bold, both unsubstantiated.
First of all, SIA in bayesian is up to debate. That’s the whole point of halfer/thirder disagreement. A “consistent” reasoning is not necessarily correct. Halfers are also consistent.
Second of all, the statistics involved is as basic as it gets. You are saying with a simple random sample of 9 rooms with 3 reds, it is wrong to estimate the population have 30% reds. Yet no argument is given.
Also please take no offence, but I am not going to continue this discussion we are having. All I have been doing is explaining the same points again and again. While the replies I got are short and effortless. I feel this is no longer productive.
My replies to you are short, but they weren’t simple to write. Each of them took at least 30 minutes of work, condensing the issues in the most clear way. Apologies if that didn’t come across. Maybe a longer explanation would help? Here goes:
In the latest reply I tried to hint that many people use “simple statistics” in a way that disagrees with Bayes, and usually they turn out to be wrong in the end. One example is the boy or girl puzzle, which Eliezer mentioned here. Monty Hall variations are another well known example, they lead to many plausible-sounding frequentist intuitions, which are wrong while Bayes is reliably right. After you’ve faced enough such puzzles, you learn how to respond. Someone tells me, hey, look at this frequentist argument, it gives a weird result! And I reply, sorry, but if you can’t capture the weirdness in a Bayesian way, then no sale. If your ad hoc tools are correct, they should translate to the Bayes language easily. If translating is harder than you thought, you should get worried, not confident.
To put it another way, you’ve been talking about supernatural predictive power. But if it looks supernatural only to non-Bayesians, while Bayesians see nothing wrong, it must be very supernatural indeed! The best way to make sure it’s not an illusion is to try explaining the supernaturalness to a Bayesian. That’s what I’ve been asking you to do.
In both boy or girl puzzle and Monty hall problem the main point is “how” the new information is obtained. Is the mathematician randomly picking a child and mentioning its gender, or is he purposely checking for a boy among his children. Does the host know what’s behind the door and always reveal a goat, or does he simple randomly opens a door and it turns out to be a goat. Or in statistic terms: how is the sample drawn. Once that is clear bayesian and statistics gives the same result. Of course if one start from a wrong assumption about the sampling process his conclusion would be wrong. No argument there.
But SIA itself is a statement regarding how the sample is drawn. Why we must only check its merit with bayesian but not stats? And if you are certain the statistic reasoning is wrong then instead of pointing to different probability puzzles why not point out the mistake?
With all these posts you haven’t even mention whether you believe the thirder should estimate R=27 or not. While I have been explicitly clear about my positions and dissecting my arguments step by step I feel you are being very vague about yours. This put me into a harder and more labours position to counter argue. That’s why I feel this discussion is no longer about sleeping beauty problem but more about who’s right and who’s better at arguing. That’s not productive, and I am leaving it.
If by “estimate” you mean “highest credence”, the short answer is that Bayesians usually don’t use such tools (maximum likelihood, unbiased estimates, etc.) They use plain old expected values instead.
After waking up in a red room and then opening 2 red and 6 blue rooms, a Bayesian thirder will believe the expected value of R to be 321⁄11, which is a bit over 29. I calculated it directly and then checked with a numerical simulation.
It’s easy to explain why the expected value isn’t 27 (proportional to the fraction of red in the sample). Consider the case where all 9 rooms seen are red. Should a Bayesian then believe that the expected value of R is 81? No way! That would imply believing R=81 with probability 100%, because any nonzero credence for R<81 would lead to lower expected value. That’s way overconfident after seeing only 9 rooms, so the right expected value must be lower. You can try calculating it, it’s a nice exercise.
Appreciate the effort. Especially about the calculation part. I am no expert on coding. But from my limited knowledge on python the calculation looks correct to me. I want to point out for the direct calculation formulation like this+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)+%2F+(sum+(+((r)+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)) gives the same answer. I would say it reflect SIA reasoning more and resemble your code better as well. Basically it shows under SIA beauty should treat her own room the same way as the other 8 rooms.
The part explaining the relationship between expected value and unbiased estimation (maximum likelihood) is obviously correct. Though I wouldn’t say it is relevant to the argument.
You claim Bayesian’s don’t usually uses maximum likelihood or unbiased estimates. I would say that is a mistake. They are important in decision making. However “usually” is a subjective term and argument about how often is “usual” is pointless. The bottom line is they are valid questions to ask and bayesians should have an answer. And how should thirders answer it, that is the question.
Mathematically, maximum likelihood and unbiased estimates are well defined, but Bayesians don’t expect them to always agree with intuition.
For example, imagine you have a coin whose parameter is known to be between 1⁄3 and 2⁄3. After seeing one tails, an unbiased estimate of the coin’s parameter is 0 (lower than all possible parameter values) and the maximum likelihood estimate is 1⁄3 (jumping to extremes after seeing a tiny bit of information). Bayesian expected values don’t have such problems.
You can stop kicking the sand castle of frequentism+SIA, it never had strong defenders anyway. Bayes+SIA is the strong inconvenient position you should engage with.
That’s an unfair comparison, since you assume a good prior. Screw up the prior and Bayes can be made to look as silly as you like.
Doing frequentist estimation on the basis of one data point is stupid, of course.
Maximum likelihood is indeed 0 or Tails, assuming we start from a uniform prior. 1⁄3 is the expected value. Ask yourself this, after seeing a tail what should you guess for the next toss result to have maximum likelihood of being correct?
If halfers reasoning applies to both Bayesian and Frequentist while SIA is only good in Bayesian isn’t it quite alarming to say the least?
The 0 isn’t a prediction of the next coin toss, it’s an unbiased estimate of the coin parameter which is guaranteed to lie between 1⁄3 and 2⁄3. That’s the problem! Depending on the randomness in the sample, an unbiased estimate of unknown parameter X could be smaller or larger than literally all possible values of X. Since in the post you use unbiased estimates and expect them to behave reasonably, I thought this example would be relevant.
Hopefully that makes it clearer why Bayesians wouldn’t agree that frequentism+halfism is coherent. They think frequentism is incoherent enough on its own :-)
OK, I misunderstood. I interpreted the coin is biased 1⁄3 to 2⁄3 but we don’t know which side it favours. If we start from uniform (1/2 to H and 1⁄2 to T), then the maximum likelihood is Tails.
Unless I misunderstood again, you mean there is a coin we want to guess its natural chance (forgive me if I’m misusing terms here). We do know its chance is bounded between 1⁄3 and 2⁄3. In this case yes, the statistical estimate is 0 while the maximum likelihood is 1⁄3. However it is obviously due to the use of a informed prior (that we know it is between 1⁄3 and 2⁄3). Hardly a surprise.
Also I want to point out in your previous example you said SIA+frequentist never had any strong defenders. That is not true. Until now in literatures thirding are generally considered to be a better fit for frequentist than halving. Because long run frequency of Tail awakening is twice as many as Head awakenings. Such arguments are used by published academics including Elga. Therefore I would consider my attack from the frequentist angle has some value.
Interesting. I guess the right question is, if you insist on a frequentist argument, how simple can you make it? Like I said, I don’t expect things like unbiased estimates to behave intuitively. Can you make the argument about long run frequencies only? That would go a long way in convincing me that you found a genuine contradiction.
Yes, I have given a long run frequency argument for halving in part I. Sadly that part have not gotten any attention. My entire argument is about the importance of perspective disagreement in SBP. This counter argument is actually the less important part.
Sorry slightly confused here, bias (although an F concept, since it relies on “true parameter value”) is sort of orthogonal to B vs F.
Estimates based on either B or F techniques could be biased or unbiased.
Quoth famous Bayesian Andrew Gelman:
“I can’t keep track of what all those Bayesians are doing nowadays—unfortunately, all sorts of people are being seduced by the promises of automatic inference through the “magic of MCMC”—but I wish they would all just stop already and get back to doing statistics the way it should be done, back in the old days when a p-value stood for something, when a confidence interval meant what it said, and statistical bias was something to eliminate, not something to embrace.”
(http://www.stat.columbia.edu/~gelman/research/published/badbayesmain.pdf)
Heh. I’m not a strong advocate of Bayesianism, but when someone says their estimator is unbiased, that doesn’t fill me with trust. There are many problems where the unique unbiased estimator is ridiculous (e.g. negative with high probability when the true parameter is always positive, etc.)
Sure, unbiasedness is a weak property:
If you throw a dart either one foot to the left or one foot to the right of the bullseye, you are unbiased wrt the bullseye, but this is stupid.
Consistency is a better property.
Lets say you wake up in room 1 which is red, then open room 2 which is blue, and room 3 stays unopened. Are you using {1,2} as a random sample that predicts the frequency of red in {2,3}? How on Earth is that reasonable?