This is interesting, and I’d like to understand exactly how the updating goes at each step. I’m not totally sure myself, which is why I’m asking the question about what your approach implies.
Remember Beauty now has to update on two things: the bias of the coin (the fraction p of times it would fall Tails in many throws) and whether it actually fell Tails in the particular throw. So she has to maintain a subjective distribution over the pair of parameters (p, Heads|Tails).
Step 1: Assuming an “ignorant” prior (no information about p except that is between 0 and 1) she has a distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r for all values of r between 0 and 1. This gives P[Tails] = 1⁄2 by integration.
Step 2: On awakening, does she update her distribution of p, or of the probability of Tails given that p=r? Or does she do both?
It seems paradoxical that the mere fact of waking up would cause her to update either of these. But she has to update something to allow her to now set P[Tails] = 2⁄3. I’m not sure exactly how she should do it, so your views on that would be helpful.
One approach is to use relative frequency again. Assume the experiment is now run multiple times, but with different coins each time, and the coins are chosen from a huge pile of coins having all biases between zero and one in “equal numbers”. (I’m not sure this makes sense, partly because p is a continuous variable, and we’ll need to approximate it by a discrete variable to get the pile to have equal numbers; but mainly because the whole
approach seems contrived. However, I will close my eyes and calculate!)
The fraction of awakenings after throwing a coin with bias p becomes proportional to 1 + p. So after normalization, the distribution of p on awakening should shift to (2/3)(1 + p). Then, given that a coin with bias p is thrown, the fraction of awakenings after Tails is 2p / (1 + p), so the joint distribution after awakening is P[p = r & Tails] = (4/3)r, and P[p = r & Heads] = (2/3)(1 - r), which when integrating again gives P[Tails] = 2⁄3.
Step 3: When Beauty learns it is Monday what happens then? Well her evidence (call it “E”) is that”I have been told that it is Monday today” (or “This awakening of Beauty is on Monday” if you want to ignore the possible complication of untruthful reports). Notice the indexical terms.
Continuing with the relative frequency approach (shut up and calculate again!) Beauty should set P[E|p = r] = 1/(1+r) since if a coin with bias r is thrown repeatedly, that becomes the fraction of all Beauty awakenings which will learn that “today is Monday”. So the evidence E should indeed shift Beauty’s distribution on p towards lower values of p (since they assign higher probability to the evidence E). However, all the shift is doing here is to reverse the previous upward shift at Step 2.
More formally, we have P[E & p = r] proportional to 1/(1 + r) x (1 + r) and the factors cancel out, so that p[E & p = r] is a constant in r. Hence P[p = r | E] is also a constant in r, and we are back to the uniform distribution over p. Filling in the distribution in the other variable, we get P[Tails | E & p = r] = r. Again look at relative frequencies: if a coin with bias r is thrown repeatedly, then among the Monday-woken Beauties, a fraction r of them will be woken after Tails. So we are back to the original joint distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r, and again P[Tails] = 1⁄2 by integration.
After all that work, the effect of Step 2 is very like applying an SIA shift (Bias to Tails is deemed more likely, because that results in more Beautiful experiences) and the effect of Step 3 is then like applying an SSA shift (Heads-bias is more likely, because that makes it more probable that a randomly-selected Beautiful experience is a Monday-experience). The results cancel out. Churning through the trillion-Beauty case will give the same effect, but with bigger shifts in each direction; however they still cancel out.
The application to the Doomsday Argument is that (as is usual given the application of SIA and SSA together) there is no net shift towards “Doom” (low probability of expanding, colonizing the Galaxy with a trillion trillion people and so on). This is how I think it should go.
However, as I noted in my previous comments, there is still a “Presumptuous Philosopher” effect when Beauty wakes up, and it is really hard to justify this if the relative frequencies of different coin weights don’t actually exist. You could consider for instance that Beauty has different physical theories about p: one of those theories implies that p = 1⁄2 while another implies that p = 9⁄10. (This sounds pretty implausible if a coin, but if the coin-flip is replaced by some poorly-understood randomization source like a decaying Higgs Boson, then this seems more plausible). Also, for the sake of argument, both theories imply infinite multiverses, so that there are just as many
Beautiful awakenings—infinitely many—in each case.
How can Beauty justify believing the second theory more, simpy because she has just woken up, when she didn’t believe it before going to sleep? That does sound really Presumptuous!
A final point is that SIA tends to cause problems when there is a possibility of an infinite multiverse, and—as I’ve posted elsewhere—it doesn’t actually counter SSA in those cases, so we are still left with the Doomsday Argument. It’s a bit like refusing to shift towards “Tails” at Step 2 (there will be infinitely many Beauty awakenings for any value of p, so why shift? SIA doesn’t tell us to), but then shifting to “Heads” after Step 3 (if there is a coin bias towards Heads then most of the Beauty-awakenings are on Monday, so SSA cares, and let’s shift). In the trillion-Beauty
case, there’s a very big “Heads” shift but without the compensating “Tails” shift.
If your approach can recover the sorts of shift that happen under SIA+SSA, but without postulating either, that is a bonus, since it means we don’t have to worry about how to apply SIA in the infinite case.
So what does Bayes’ theorem tell us about the Sleeping Beauty case?
It says that P(B|AC) = P(B|C) * P(A|BC)/P(A|C). In this case C is sleeping beauty’s information before she wakes up, which is there for all the probabilities of course. A is the “anthropic information” of waking up and learning that what used to be “AND” things are now mutually exclusive things. B is the coin landing tails.
Bayes’ theorem actually appears to break down here, if we use the simple interpretation of P(A) as “the probability she wakes up.” Because Sleeping Beauty wakes up in all the worlds, this interpretation says P(A|C) = 1, and P(A|BC) = 1, and so learning A can’t change anything.
This is very odd, and is an interesting problem with anthropics (see eliezer’s post “The Anthropic Trilemma”). The practical but difficult-to-justify way to fix it is to use frequencies, not probabilities—because she can have a average frequency of waking up of 2 or 3⁄2, while probabilities can’t go above 1.
But the major lesson is that you have to be careful about applying Bayes’ rule in this sort of situation—if you use P(A) in the calculation, you’ll get this problem.
Anyhow, only some of this a response to anything you wrote, I just felt like finishing my line of thought :P Maybe I should solve this...
Thanks… whatever the correct resolution is, violating Bayes’s Theorem seems a bit drastic!
My suspicion is that A contains indexical evidence (summarized as something like “I have just woken up as Beauty, and remember going to sleep on Sunday and the story about the coin-toss”). The indexical term likely means that P[A] is not equal to 1 though exactly what it is equal to is an interesting question.
I don’t personally have a worked-out theory about indexical probabilities, though my latest WAG is a combination of SIA and SSA, with the caveat I mentioned on infinite cases not working properly under SIA. Basically I’ll try to map it to a relative frequency problem, where all the possibilities are realised a large but finite number of times, and count P[E] as the relative frequency of observations which contain evidence E (including any indexical evidence), taking the limit where the number of observations increases to infinity. I’m not totally satisfied with that approach, but it seems to work as a calculational tool.
This is interesting, and I’d like to understand exactly how the updating goes at each step. I’m not totally sure myself, which is why I’m asking the question about what your approach implies.
Remember Beauty now has to update on two things: the bias of the coin (the fraction p of times it would fall Tails in many throws) and whether it actually fell Tails in the particular throw. So she has to maintain a subjective distribution over the pair of parameters (p, Heads|Tails).
Step 1: Assuming an “ignorant” prior (no information about p except that is between 0 and 1) she has a distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r for all values of r between 0 and 1. This gives P[Tails] = 1⁄2 by integration.
Step 2: On awakening, does she update her distribution of p, or of the probability of Tails given that p=r? Or does she do both?
It seems paradoxical that the mere fact of waking up would cause her to update either of these. But she has to update something to allow her to now set P[Tails] = 2⁄3. I’m not sure exactly how she should do it, so your views on that would be helpful.
One approach is to use relative frequency again. Assume the experiment is now run multiple times, but with different coins each time, and the coins are chosen from a huge pile of coins having all biases between zero and one in “equal numbers”. (I’m not sure this makes sense, partly because p is a continuous variable, and we’ll need to approximate it by a discrete variable to get the pile to have equal numbers; but mainly because the whole approach seems contrived. However, I will close my eyes and calculate!)
The fraction of awakenings after throwing a coin with bias p becomes proportional to 1 + p. So after normalization, the distribution of p on awakening should shift to (2/3)(1 + p). Then, given that a coin with bias p is thrown, the fraction of awakenings after Tails is 2p / (1 + p), so the joint distribution after awakening is P[p = r & Tails] = (4/3)r, and P[p = r & Heads] = (2/3)(1 - r), which when integrating again gives P[Tails] = 2⁄3.
Step 3: When Beauty learns it is Monday what happens then? Well her evidence (call it “E”) is that”I have been told that it is Monday today” (or “This awakening of Beauty is on Monday” if you want to ignore the possible complication of untruthful reports). Notice the indexical terms.
Continuing with the relative frequency approach (shut up and calculate again!) Beauty should set P[E|p = r] = 1/(1+r) since if a coin with bias r is thrown repeatedly, that becomes the fraction of all Beauty awakenings which will learn that “today is Monday”. So the evidence E should indeed shift Beauty’s distribution on p towards lower values of p (since they assign higher probability to the evidence E). However, all the shift is doing here is to reverse the previous upward shift at Step 2.
More formally, we have P[E & p = r] proportional to 1/(1 + r) x (1 + r) and the factors cancel out, so that p[E & p = r] is a constant in r. Hence P[p = r | E] is also a constant in r, and we are back to the uniform distribution over p. Filling in the distribution in the other variable, we get P[Tails | E & p = r] = r. Again look at relative frequencies: if a coin with bias r is thrown repeatedly, then among the Monday-woken Beauties, a fraction r of them will be woken after Tails. So we are back to the original joint distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r, and again P[Tails] = 1⁄2 by integration.
After all that work, the effect of Step 2 is very like applying an SIA shift (Bias to Tails is deemed more likely, because that results in more Beautiful experiences) and the effect of Step 3 is then like applying an SSA shift (Heads-bias is more likely, because that makes it more probable that a randomly-selected Beautiful experience is a Monday-experience). The results cancel out. Churning through the trillion-Beauty case will give the same effect, but with bigger shifts in each direction; however they still cancel out.
The application to the Doomsday Argument is that (as is usual given the application of SIA and SSA together) there is no net shift towards “Doom” (low probability of expanding, colonizing the Galaxy with a trillion trillion people and so on). This is how I think it should go.
However, as I noted in my previous comments, there is still a “Presumptuous Philosopher” effect when Beauty wakes up, and it is really hard to justify this if the relative frequencies of different coin weights don’t actually exist. You could consider for instance that Beauty has different physical theories about p: one of those theories implies that p = 1⁄2 while another implies that p = 9⁄10. (This sounds pretty implausible if a coin, but if the coin-flip is replaced by some poorly-understood randomization source like a decaying Higgs Boson, then this seems more plausible). Also, for the sake of argument, both theories imply infinite multiverses, so that there are just as many Beautiful awakenings—infinitely many—in each case.
How can Beauty justify believing the second theory more, simpy because she has just woken up, when she didn’t believe it before going to sleep? That does sound really Presumptuous!
A final point is that SIA tends to cause problems when there is a possibility of an infinite multiverse, and—as I’ve posted elsewhere—it doesn’t actually counter SSA in those cases, so we are still left with the Doomsday Argument. It’s a bit like refusing to shift towards “Tails” at Step 2 (there will be infinitely many Beauty awakenings for any value of p, so why shift? SIA doesn’t tell us to), but then shifting to “Heads” after Step 3 (if there is a coin bias towards Heads then most of the Beauty-awakenings are on Monday, so SSA cares, and let’s shift). In the trillion-Beauty case, there’s a very big “Heads” shift but without the compensating “Tails” shift.
If your approach can recover the sorts of shift that happen under SIA+SSA, but without postulating either, that is a bonus, since it means we don’t have to worry about how to apply SIA in the infinite case.
So what does Bayes’ theorem tell us about the Sleeping Beauty case?
It says that P(B|AC) = P(B|C) * P(A|BC)/P(A|C). In this case C is sleeping beauty’s information before she wakes up, which is there for all the probabilities of course. A is the “anthropic information” of waking up and learning that what used to be “AND” things are now mutually exclusive things. B is the coin landing tails.
Bayes’ theorem actually appears to break down here, if we use the simple interpretation of P(A) as “the probability she wakes up.” Because Sleeping Beauty wakes up in all the worlds, this interpretation says P(A|C) = 1, and P(A|BC) = 1, and so learning A can’t change anything.
This is very odd, and is an interesting problem with anthropics (see eliezer’s post “The Anthropic Trilemma”). The practical but difficult-to-justify way to fix it is to use frequencies, not probabilities—because she can have a average frequency of waking up of 2 or 3⁄2, while probabilities can’t go above 1.
But the major lesson is that you have to be careful about applying Bayes’ rule in this sort of situation—if you use P(A) in the calculation, you’ll get this problem.
Anyhow, only some of this a response to anything you wrote, I just felt like finishing my line of thought :P Maybe I should solve this...
Thanks… whatever the correct resolution is, violating Bayes’s Theorem seems a bit drastic!
My suspicion is that A contains indexical evidence (summarized as something like “I have just woken up as Beauty, and remember going to sleep on Sunday and the story about the coin-toss”). The indexical term likely means that P[A] is not equal to 1 though exactly what it is equal to is an interesting question.
I don’t personally have a worked-out theory about indexical probabilities, though my latest WAG is a combination of SIA and SSA, with the caveat I mentioned on infinite cases not working properly under SIA. Basically I’ll try to map it to a relative frequency problem, where all the possibilities are realised a large but finite number of times, and count P[E] as the relative frequency of observations which contain evidence E (including any indexical evidence), taking the limit where the number of observations increases to infinity. I’m not totally satisfied with that approach, but it seems to work as a calculational tool.