We both know what question is being asked. We both know how many times she awakens and is interviewed. I know what subjective probability is (I assume you do too). I showed you my math. I also explained why your ratio of expected frequencies does not correspond to the subjective probability that you think it does.
I started by reading the wikipedia page. At that point, the 1⁄3 solution made some sense to me, but I was bothered by the fact that you couldn’t derive it from probability laws. I then read articles by Bostrom and Radford. I spent a lot of time working on the problem, etc. Eventually, I figured out precisely why the 1⁄3 solution is wrong.
Is Wikipedia a stronger authority than me here? Probably. But I know where the argument there fails, so it’s not very convincing.
It’s fascinating to me that you won’t tell me which probability is wrong, p(H)=1/2, P(monday|H)=1
It’s also interesting that you won’t defend your answer (other than saying I’m wrong). You are in a situation where the number of trials depends on outcome, but are using an estimator that is valid for independent trials. Show me that yours converges to a probability. Standard theory doesn’t hold here.
Probabilities are subjective. From Beauty’s POV, if she has just awakened to face an interview, then p(H)=1/3. If she has learned that is Friday and the experiment is over, (but she has not yet been told which side the coin came down), then she updates on that info, and then p(H)=1/2. So, the value of p(H) depends on who is being asked—and on what information they have at the time.
It’s the first one—P(H)=1/2 is wrong. Before going any further, we should adopt Jaynes’ habit of always labelling the prior knowledge in our probabilities, because there are in fact two probabilities that we care about: P(H|the experiment ran), and P(H|Sleeping Beauty has just been woken). These are 1⁄2 and 1⁄3, respectively. The first of these probabilities is given in the problem statement, but the second is what is asked for, and what should be used for calculating expected value in any betting, because any bets made occur twice if the coin was tails.
How can these things be different, P(H|the experiment ran) and P(H|Sleeping Beauty has just been woken)?
Yes, a bet would occur twice if tails, if you set the problem up that way. But the question has to do with her credence at an awakening.
The 1⁄3 calculation is derived from treating the 3 counts as if they arose from independent draws of a mulitinomial distribution. They are not independent draws. There is 1 degree of freedom, not 2. Thus, the ratio that lead to the 1⁄3 value is not the probability that people seem to think it is. It’s not clear that the ratio is a probability at all.
What’s this about a multinomial distribution and degrees of freedom? I calculated P(H|W) as E(occurances of H&&W)/E(occurances of W) = (1/2)/(3/2) = 1⁄3.
Yes, exactly. That would be a valid probability if these were expected frequencies from independent draws of a multinomial distribution (it would have 2 degrees of freedom). Your ratio of expected values does not result in P(H|W).
It might become clear if you think about it this way. Your expected number of occurrences of W is greater than the largest possible value of occurrences of H&W. You don’t have a ratio of number of events to number of independent trials.
Picture a 3 by 1 contingency table, where we have counts in 3 cells: Monday&H, Monday&T, Tuesday&T. Typically, a 3 by 1 contingency table will have 2 degrees of freedom (the count in the 3rd cell is determined by the number of trials and the counts in the other cells). Standard statistical theory says you can estimate the probability for cell one by taking the cell one count and dividing by the total. That’s not the situation with the sleeping beauty problem. There is just one degree of freedom. If we know the count the number of coin flips and the count in one of the cells, we know the count in the other two. Standard statistical theory does not apply. The ratio of count for cell one to the total is not the probability for cell one.
Occurances of H&&W are a strict subset of occurances of W, so if to use the terminology of events and trials, each waking is a trial, and each waking where the coin was heads is a positive result. That’s 1⁄3 of all trials, so a probability of 1⁄3.
If each waking is a trial, then you have a situation where the number of trials is outcome dependent. Your estimator would be valid if the number of trials was not outcome dependent. This is the heart of the matter. The ratio of cell counts here is just not a probability.
The number of trials being outcome dependent only matters if you are using the frequentist definition of probability, or if it causes you to collect fewer trials than you need to overcome noise. We’re computing with probabilities straight from the problem statement, so there’s no noise, and as a Bayesian, I don’t care about the frequentists’ broken definition.
This has nothing to do with Bayesian vs. Frequentist. We’re just calculated probabilities from the problem statement, like you said. From the problem, we know P(H)=1/2, P(Monday|H)=1, etc, which leads to P(H|Monday or Tuesday)=1/2. The 1⁄3 calculation is not from the problem statement, but rather from a misapplication of large sample theory. The outcome-dependent sampling biases your estimator.
And it’s strange that you don’t call your approach Frequentist, when you derived it from expected cell counts in repeated samples.
And it’s strange that you don’t call your approach Frequentist, when you derived it from expected cell counts in repeated samples.
Don’t forget—around here ‘Bayesian’ is used normatively, and as part of some sort of group identification. “Bayesians” here will often use frequentist approaches in particular problems.
But that can be legitimate, as Bayesian calculations are a superset of frequentist calculations. Nothing bars a Bayesian from postulating that a limiting frequency exists in an unbounded number of trials in some hypothetical situation; but you won’t see one, e.g., accept R.A. Fisher’s argument for his use of p-values for statistical inference.
I adopted some frequentist terminology for purposes of this discussion because none of the other explanations I or others had posted seemed to be getting through, and I thought that might be the problem.
The reason I said that there’s a frequentist vs. Bayesian issue here is because the frequentist probabilitiy definition I’m most familiar with is P(f) = lim n->inf sum(f(i), i=1..n)/n, where f(x) is the x’th repetition of an independent repeatable experiment, and that definition is hard to reconcile with SB sometimes being asked twice. I assumed that issue, or a rule justified by that issue, was behind your objection.
I adopted some frequentist terminology for purposes of this discussion because none of the other explanations I or others had posted seemed to be getting through, and I thought that might be the problem.
The reason I said that there’s a frequentist vs. Bayesian issue here is because the frequentist probabilitiy definition I’m most familiar with is P(f) = lim n->inf sum(f(i), i=1..n)/n, where f(x) is the x’th repetition of an independent repeatable experiment, and that definition is hard to reconcile with SB sometimes being asked twice. I assumed that issue, or a rule justified by that issue, was behind your objection.
We both know what question is being asked. We both know how many times she awakens and is interviewed. I know what subjective probability is (I assume you do too). I showed you my math. I also explained why your ratio of expected frequencies does not correspond to the subjective probability that you think it does.
Does it not concern you even a little that the Wikipedia article you linked to quite clearly says you are wrong and explains why?
I started by reading the wikipedia page. At that point, the 1⁄3 solution made some sense to me, but I was bothered by the fact that you couldn’t derive it from probability laws. I then read articles by Bostrom and Radford. I spent a lot of time working on the problem, etc. Eventually, I figured out precisely why the 1⁄3 solution is wrong.
Is Wikipedia a stronger authority than me here? Probably. But I know where the argument there fails, so it’s not very convincing.
I think we are nearing the end here. Someone just wrote a whole post explaining why the correct answer is 1/3: http://lesswrong.com/lw/28u/conditioning_on_observers/
It’s fascinating to me that you won’t tell me which probability is wrong, p(H)=1/2, P(monday|H)=1
It’s also interesting that you won’t defend your answer (other than saying I’m wrong). You are in a situation where the number of trials depends on outcome, but are using an estimator that is valid for independent trials. Show me that yours converges to a probability. Standard theory doesn’t hold here.
Probabilities are subjective. From Beauty’s POV, if she has just awakened to face an interview, then p(H)=1/3. If she has learned that is Friday and the experiment is over, (but she has not yet been told which side the coin came down), then she updates on that info, and then p(H)=1/2. So, the value of p(H) depends on who is being asked—and on what information they have at the time.
It’s the first one—P(H)=1/2 is wrong. Before going any further, we should adopt Jaynes’ habit of always labelling the prior knowledge in our probabilities, because there are in fact two probabilities that we care about: P(H|the experiment ran), and P(H|Sleeping Beauty has just been woken). These are 1⁄2 and 1⁄3, respectively. The first of these probabilities is given in the problem statement, but the second is what is asked for, and what should be used for calculating expected value in any betting, because any bets made occur twice if the coin was tails.
How can these things be different, P(H|the experiment ran) and P(H|Sleeping Beauty has just been woken)?
Yes, a bet would occur twice if tails, if you set the problem up that way. But the question has to do with her credence at an awakening.
The 1⁄3 calculation is derived from treating the 3 counts as if they arose from independent draws of a mulitinomial distribution. They are not independent draws. There is 1 degree of freedom, not 2. Thus, the ratio that lead to the 1⁄3 value is not the probability that people seem to think it is. It’s not clear that the ratio is a probability at all.
What’s this about a multinomial distribution and degrees of freedom? I calculated P(H|W) as E(occurances of H&&W)/E(occurances of W) = (1/2)/(3/2) = 1⁄3.
Yes, exactly. That would be a valid probability if these were expected frequencies from independent draws of a multinomial distribution (it would have 2 degrees of freedom). Your ratio of expected values does not result in P(H|W).
It might become clear if you think about it this way. Your expected number of occurrences of W is greater than the largest possible value of occurrences of H&W. You don’t have a ratio of number of events to number of independent trials.
Picture a 3 by 1 contingency table, where we have counts in 3 cells: Monday&H, Monday&T, Tuesday&T. Typically, a 3 by 1 contingency table will have 2 degrees of freedom (the count in the 3rd cell is determined by the number of trials and the counts in the other cells). Standard statistical theory says you can estimate the probability for cell one by taking the cell one count and dividing by the total. That’s not the situation with the sleeping beauty problem. There is just one degree of freedom. If we know the count the number of coin flips and the count in one of the cells, we know the count in the other two. Standard statistical theory does not apply. The ratio of count for cell one to the total is not the probability for cell one.
Occurances of H&&W are a strict subset of occurances of W, so if to use the terminology of events and trials, each waking is a trial, and each waking where the coin was heads is a positive result. That’s 1⁄3 of all trials, so a probability of 1⁄3.
If each waking is a trial, then you have a situation where the number of trials is outcome dependent. Your estimator would be valid if the number of trials was not outcome dependent. This is the heart of the matter. The ratio of cell counts here is just not a probability.
The number of trials being outcome dependent only matters if you are using the frequentist definition of probability, or if it causes you to collect fewer trials than you need to overcome noise. We’re computing with probabilities straight from the problem statement, so there’s no noise, and as a Bayesian, I don’t care about the frequentists’ broken definition.
This has nothing to do with Bayesian vs. Frequentist. We’re just calculated probabilities from the problem statement, like you said. From the problem, we know P(H)=1/2, P(Monday|H)=1, etc, which leads to P(H|Monday or Tuesday)=1/2. The 1⁄3 calculation is not from the problem statement, but rather from a misapplication of large sample theory. The outcome-dependent sampling biases your estimator.
And it’s strange that you don’t call your approach Frequentist, when you derived it from expected cell counts in repeated samples.
Don’t forget—around here ‘Bayesian’ is used normatively, and as part of some sort of group identification. “Bayesians” here will often use frequentist approaches in particular problems.
But that can be legitimate, as Bayesian calculations are a superset of frequentist calculations. Nothing bars a Bayesian from postulating that a limiting frequency exists in an unbounded number of trials in some hypothetical situation; but you won’t see one, e.g., accept R.A. Fisher’s argument for his use of p-values for statistical inference.
I adopted some frequentist terminology for purposes of this discussion because none of the other explanations I or others had posted seemed to be getting through, and I thought that might be the problem.
The reason I said that there’s a frequentist vs. Bayesian issue here is because the frequentist probabilitiy definition I’m most familiar with is P(f) = lim n->inf sum(f(i), i=1..n)/n, where f(x) is the x’th repetition of an independent repeatable experiment, and that definition is hard to reconcile with SB sometimes being asked twice. I assumed that issue, or a rule justified by that issue, was behind your objection.
I adopted some frequentist terminology for purposes of this discussion because none of the other explanations I or others had posted seemed to be getting through, and I thought that might be the problem.
The reason I said that there’s a frequentist vs. Bayesian issue here is because the frequentist probabilitiy definition I’m most familiar with is P(f) = lim n->inf sum(f(i), i=1..n)/n, where f(x) is the x’th repetition of an independent repeatable experiment, and that definition is hard to reconcile with SB sometimes being asked twice. I assumed that issue, or a rule justified by that issue, was behind your objection.