That’s interesting. But then you have to either abandon Bayes’ Law, or else adopt very bizarre interpretations of p(D|H), p(H) and p(D) in order to make it come out. Both of these seem like very heavy prices to pay. I’d rather admit that my intuition was wrong.
Is the motivating intuition beyond your comment, the idea that your subjective probability should be the same as the odds you’d take in a (fair) bet?
Subjective probabilities are traditionally analyzed in terms of betting behavior. Bets that are used for elucidating subjective probabilities are constructed using “scoring rules”. It’s a standard way of revealing such probabilities.
I am not sure what you mean by “abandoning Bayes’ Law”, or using “bizarre” interpretations of probability. In this case, the relevant data includes the design of the experiment—and that is not trivial to update on, so there is scope for making mistakes. Before questioning the integrity of your tools, is it possible that a mistake was made during their application?
Bayes’ Law says, p(H|D) = p(D|H) p(H) / p(D) where H is the hypothesis of interest and D is the observed data. In the Sleeping Beauty problem H is “the coin lands heads” and D is “Sleeping Beauty is awake”. p(H) = ½, and p(D|H) = p(D) = 1. So if your intuition tells you that p(H|D) = ⅓, then you have to either abandon Bayes’ Law, or else change one or more of the values of p(D|H), p(H) and p(D) in order to make it come out.
(We can come back to the intuition about bets once we’ve dealt with this point.)
But marginalizing over the day doesn’t work out to P(D)=1 since on some days Beauty is left asleep, depending on how the coin comes up.
Here is (for a three-day variant) the full joint probability distribution, showing values which are in accordance with Bayes’ Law but where P(D) and P(D|H) are not the above. We can’t “change the values” willy-nilly, they fall out of formalizing the problem.
Frustratingly, I can’t seem to get people to take much interest in that table, even though it seems to solve the freaking problem. It’s possible that I’ve made a mistake somewhere, in which case I’d love to see it pointed out.
I was just talking about the notation “p(D|H)” (and “p(D)”), given that D has been defined as the observed data. Then any extra variables have to have been marginalized out, or the expression would be p(D, day | H). I didn`t mean to assert anything about the correctness of the particular number ascribed to p(D|H).
I did look at the table, but I missed the other sheets, so I didn`t understand what you were arguing.
It has three sheets. The respective conclusions are: p(heads|woken) = 0.25, p(heads|woken) = 0.33 and p(heads|woken) = 0.50. One wonders what you are trying to say.
That 1⁄3 is correct in the original, that 1⁄2 comes from allocating zero probability mass to “not woken up”, and the three-day version shows why that is wrong.
I don’t see how that analysis is useful. Beauty is awake at the start and the end of the experiment, and she updates accordingly, depending on whether she believes she is “inside” the experiment or not. So, having D mean: “Sleeping Beauty is awake” does not seem very useful. Beauty’s “data” should also include her knowledge of the experimental setup, her knowledge of the identity of the subject, and whether she is facing an interviewer with amnesia. These things vary over time—and so they can’t usefully be treated as a single probability.
You should be careful if plugging values into Bayes’ theorem in an attempt to solve this problem. It contains an amnesia-inducing drug. When Beauty updates, you had better make sure to un-update her again afterwards in the correct manner.
D is the observation that Sleeping Beauty makes in the problem, something like “I’m awake, it’s during the experiment, I don’t know what day it is, and I can’t remember being awoken before”. p(D) is the prior probability of making this observation during the experiment. p(D|H) is the likelihood of making this observation if the coin lands heads.
As I said, if your intuition tells you that p(H|D) = ⅓, then something else has to change to make the calculation work. Either you abandon or modify Bayes’ Law (in this case, at least) or you need to disagree with me on one or more of p(D), p(D|H), and p(H).
As I said, be careful about using Bayes’ theorem in the case where the agent’s mind is being meddled with by amnesia-inducing drugs. If Beauty had not had her mind addled by drugs, your formula would work—and p(H|D) would be equal to 1⁄2 on her first awakening. As it is, Beauty has lost some information that pertains to the answer she gives to the problem—namely the knowledge of whether she has been woken up before already—or not. Her uncertainty about this matter is the cause of the problem with plugging numbers into Bayes’ theorem.
The theorem models her update on new information—but does not model the drug-induced deletion from her mind of information that pertains to the answer she gives to the problem.
If she knew it was Monday, p(H|D) would be about 1⁄2. If she knew it was Tuesday, p(H|D) would be about 0. Since she is uncertain, the value lies between these extremes.
Is over-reliance on Bayes’ theorem—without considering its failure to model the problem’s drug-induced amnesia—a cause of people thinking the answer to the problem is 1⁄2, I wonder?
If I understand rightly, you’re happy with my values for p(H), p(D) and p(D|H), but you’re not happy with the result. So you’re claiming that a Bayesian reasoner has to abandon Bayes’ Law in order to get the right answer to this problem. (Which is what I pointed out above.)
Is your argument the same as the one made by Bradley Monton? In his paper Sleeping Beauty and the forgetful Bayesian, Monton argues convincingly that a Bayesian reasoner needs to update upon forgetting, but he doesn’t give a rule explaining how to do it.
Naively, I can imagine doing this by putting the reasoner back in the situation before they learned the information they forgot, and then updating forwards again, but omitting the forgotten information. (Monton gives an example on pp. 51–52 where this works.) But I can’t see how to make this work in the Sleeping Beauty case: how do I put Sleeping Beauty back in the state before she learned what day it is?
So I think the onus remains with you to explain the rules for Bayesian forgetting, and how they lead to the answer ⅓ in this case. (If you can do this convincingly, then we can explain the hardness of the Sleeping Beauty problem by pointing out how little-known the rules for Bayesian forgetting are.)
Well, there is not anything wrong with Bayes’ Law. It doesn’t model forgetting—but it doesn’t pretend to. I would not say you have to “abandon” Bayes’ Law to solve the problem. It is just that the problem includes a process (namely: forgetting) that Bayes’ Law makes no attempt to model in the first place. Bayes’ Law works just fine for elements of the problem involving updating based on evidence. What you have to do is not abuse Bayes’ Law—by using it in circumstances for which it was never intended and is not appropriate.
Your opinion that I am under some kind of obligation to provide a lecture on the little-known topic of Bayesian forgetting has been duly noted. Fortunately, people don’t need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem—but it would certainly help if they avoid applying
the Bayes update rule while completely ignoring the whole issue of the effect of drug-induced amnesia—much as Bradley Monton explains.
You’re not obliged to give a lecture. A reference would be ideal.
Appealing to “forgetting” only gives an argument that our reasoning methods are incomplete: it doesn’t argue against ½ or in favour of ⅓. We need to see the rules and the calculation to decide if it settles the matter.
To reiterate, people do not need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem. Nobody used this approach to solving the problem—as far as I am aware—but the vast majority obtained the correct answer nontheless. Correct reasoning is given on http://en.wikipedia.org/wiki/Sleeping_Beauty_problem—and in dozens of prior comments on the subject.
The Wikipedia page explains how a frequentist can get the answer ⅓, but it doesn’t explain how a Bayesian can get that answer. That’s what’s missing.
I’m still hoping for a reference for “the Bayesian rules of forgetting”. If these rules exist, then we can check to see if they give the answer ⅓ in the Sleeping Beauty case. That would go a long way to convincing a naive Bayesian.
I do not think it is missing—since a Bayesian can ask themselves at what odds they would accept a bet on the coin coming up heads—just as easily as any other agent can.
What is missing is an account involving Bayesian forgetting. It’s missing because that is a way of solving the problem which makes little practical sense.
Now, it might be an interesting exercise to explore the rules of Bayesian forgetting—but I don’t think it can be claimed that that is needed to solve this problem—even from a Bayesian perspective. Bayesians have more tools available to them than just Bayes’ Law.
FWIW, Bayesian forgetting looks somewhat managable. Bayes’ Law is a reversible calculation—so you can just un-apply it.
That’s interesting. But then you have to either abandon Bayes’ Law, or else adopt very bizarre interpretations of p(D|H), p(H) and p(D) in order to make it come out. Both of these seem like very heavy prices to pay. I’d rather admit that my intuition was wrong.
Is the motivating intuition beyond your comment, the idea that your subjective probability should be the same as the odds you’d take in a (fair) bet?
Subjective probabilities are traditionally analyzed in terms of betting behavior. Bets that are used for elucidating subjective probabilities are constructed using “scoring rules”. It’s a standard way of revealing such probabilities.
I am not sure what you mean by “abandoning Bayes’ Law”, or using “bizarre” interpretations of probability. In this case, the relevant data includes the design of the experiment—and that is not trivial to update on, so there is scope for making mistakes. Before questioning the integrity of your tools, is it possible that a mistake was made during their application?
Bayes’ Law says, p(H|D) = p(D|H) p(H) / p(D) where H is the hypothesis of interest and D is the observed data. In the Sleeping Beauty problem H is “the coin lands heads” and D is “Sleeping Beauty is awake”. p(H) = ½, and p(D|H) = p(D) = 1. So if your intuition tells you that p(H|D) = ⅓, then you have to either abandon Bayes’ Law, or else change one or more of the values of p(D|H), p(H) and p(D) in order to make it come out.
(We can come back to the intuition about bets once we’ve dealt with this point.)
Hold on—p(D|H) and P(D) are not point values but probability distributions, since there is yet another variable, namely what day it is.
The other variable has already been marginalized out.
So long as it is not Saturday. And the ideas that p(H) = ½ comes from Saturday.
But marginalizing over the day doesn’t work out to P(D)=1 since on some days Beauty is left asleep, depending on how the coin comes up.
Here is (for a three-day variant) the full joint probability distribution, showing values which are in accordance with Bayes’ Law but where P(D) and P(D|H) are not the above. We can’t “change the values” willy-nilly, they fall out of formalizing the problem.
Frustratingly, I can’t seem to get people to take much interest in that table, even though it seems to solve the freaking problem. It’s possible that I’ve made a mistake somewhere, in which case I’d love to see it pointed out.
I was just talking about the notation “p(D|H)” (and “p(D)”), given that D has been defined as the observed data. Then any extra variables have to have been marginalized out, or the expression would be p(D, day | H). I didn`t mean to assert anything about the correctness of the particular number ascribed to p(D|H).
I did look at the table, but I missed the other sheets, so I didn`t understand what you were arguing.
It seems to say that p(heads|woken) = 0.25. A whole new answer :-(
That’s in the three-day variant; it also has a sheet with the original.
It has three sheets. The respective conclusions are: p(heads|woken) = 0.25, p(heads|woken) = 0.33 and p(heads|woken) = 0.50. One wonders what you are trying to say.
That 1⁄3 is correct in the original, that 1⁄2 comes from allocating zero probability mass to “not woken up”, and the three-day version shows why that is wrong.
I don’t see how that analysis is useful. Beauty is awake at the start and the end of the experiment, and she updates accordingly, depending on whether she believes she is “inside” the experiment or not. So, having D mean: “Sleeping Beauty is awake” does not seem very useful. Beauty’s “data” should also include her knowledge of the experimental setup, her knowledge of the identity of the subject, and whether she is facing an interviewer with amnesia. These things vary over time—and so they can’t usefully be treated as a single probability.
You should be careful if plugging values into Bayes’ theorem in an attempt to solve this problem. It contains an amnesia-inducing drug. When Beauty updates, you had better make sure to un-update her again afterwards in the correct manner.
D is the observation that Sleeping Beauty makes in the problem, something like “I’m awake, it’s during the experiment, I don’t know what day it is, and I can’t remember being awoken before”. p(D) is the prior probability of making this observation during the experiment. p(D|H) is the likelihood of making this observation if the coin lands heads.
As I said, if your intuition tells you that p(H|D) = ⅓, then something else has to change to make the calculation work. Either you abandon or modify Bayes’ Law (in this case, at least) or you need to disagree with me on one or more of p(D), p(D|H), and p(H).
As I said, be careful about using Bayes’ theorem in the case where the agent’s mind is being meddled with by amnesia-inducing drugs. If Beauty had not had her mind addled by drugs, your formula would work—and p(H|D) would be equal to 1⁄2 on her first awakening. As it is, Beauty has lost some information that pertains to the answer she gives to the problem—namely the knowledge of whether she has been woken up before already—or not. Her uncertainty about this matter is the cause of the problem with plugging numbers into Bayes’ theorem.
The theorem models her update on new information—but does not model the drug-induced deletion from her mind of information that pertains to the answer she gives to the problem.
If she knew it was Monday, p(H|D) would be about 1⁄2. If she knew it was Tuesday, p(H|D) would be about 0. Since she is uncertain, the value lies between these extremes.
Is over-reliance on Bayes’ theorem—without considering its failure to model the problem’s drug-induced amnesia—a cause of people thinking the answer to the problem is 1⁄2, I wonder?
If I understand rightly, you’re happy with my values for p(H), p(D) and p(D|H), but you’re not happy with the result. So you’re claiming that a Bayesian reasoner has to abandon Bayes’ Law in order to get the right answer to this problem. (Which is what I pointed out above.)
Is your argument the same as the one made by Bradley Monton? In his paper Sleeping Beauty and the forgetful Bayesian, Monton argues convincingly that a Bayesian reasoner needs to update upon forgetting, but he doesn’t give a rule explaining how to do it.
Naively, I can imagine doing this by putting the reasoner back in the situation before they learned the information they forgot, and then updating forwards again, but omitting the forgotten information. (Monton gives an example on pp. 51–52 where this works.) But I can’t see how to make this work in the Sleeping Beauty case: how do I put Sleeping Beauty back in the state before she learned what day it is?
So I think the onus remains with you to explain the rules for Bayesian forgetting, and how they lead to the answer ⅓ in this case. (If you can do this convincingly, then we can explain the hardness of the Sleeping Beauty problem by pointing out how little-known the rules for Bayesian forgetting are.)
Well, there is not anything wrong with Bayes’ Law. It doesn’t model forgetting—but it doesn’t pretend to. I would not say you have to “abandon” Bayes’ Law to solve the problem. It is just that the problem includes a process (namely: forgetting) that Bayes’ Law makes no attempt to model in the first place. Bayes’ Law works just fine for elements of the problem involving updating based on evidence. What you have to do is not abuse Bayes’ Law—by using it in circumstances for which it was never intended and is not appropriate.
Your opinion that I am under some kind of obligation to provide a lecture on the little-known topic of Bayesian forgetting has been duly noted. Fortunately, people don’t need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem—but it would certainly help if they avoid applying the Bayes update rule while completely ignoring the whole issue of the effect of drug-induced amnesia—much as Bradley Monton explains.
You’re not obliged to give a lecture. A reference would be ideal.
Appealing to “forgetting” only gives an argument that our reasoning methods are incomplete: it doesn’t argue against ½ or in favour of ⅓. We need to see the rules and the calculation to decide if it settles the matter.
To reiterate, people do not need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem. Nobody used this approach to solving the problem—as far as I am aware—but the vast majority obtained the correct answer nontheless. Correct reasoning is given on http://en.wikipedia.org/wiki/Sleeping_Beauty_problem—and in dozens of prior comments on the subject.
The Wikipedia page explains how a frequentist can get the answer ⅓, but it doesn’t explain how a Bayesian can get that answer. That’s what’s missing.
I’m still hoping for a reference for “the Bayesian rules of forgetting”. If these rules exist, then we can check to see if they give the answer ⅓ in the Sleeping Beauty case. That would go a long way to convincing a naive Bayesian.
I do not think it is missing—since a Bayesian can ask themselves at what odds they would accept a bet on the coin coming up heads—just as easily as any other agent can.
What is missing is an account involving Bayesian forgetting. It’s missing because that is a way of solving the problem which makes little practical sense.
Now, it might be an interesting exercise to explore the rules of Bayesian forgetting—but I don’t think it can be claimed that that is needed to solve this problem—even from a Bayesian perspective. Bayesians have more tools available to them than just Bayes’ Law.
FWIW, Bayesian forgetting looks somewhat managable. Bayes’ Law is a reversible calculation—so you can just un-apply it.