Sleeping Beauty Not Resolved
Ksvanhorn recently suggested that Radford Neal provides the solution to the Sleeping Beauty problem and that the current solutions are wrong. The consensus seems to be that the maths checks out, yet there are strong suspicions that something fishy is going on without people being able to fully articulate the issues.
My position is that Neal/Ksavnhorn make some good critiques of existing solutions, but their solution falls short. Specifically, I’m not going to dispute Neal’s calculation, just that the probability he calculates completely misses anything that this problem may reasonably be trying to get at. Part of this is related to the classic problem, “If I have two sons and at least one of them is born on a Tuesday, what is the chance that both are born on the same day”. This leads me to strongly disfavour the way Neal extends the word probability to cover these cases, though of course I can’t actually say that he is wrong in any objective sense since words don’t have objective meanings.
A key part of his argument can be paraphrased as a suggestion to Shut up and multiply since verbal arguments about probability have a strong tendency to be misleading. I’ve run into these issues with the slipperiness of words myself, but at the same time, we need verbal arguments to decide why it is that we should construct the formalism a particular way. Further, my interpretation of Shut up and multiply has always been that that we shouldn’t engage in moral grandstanding by substituting emotions for logic. I didn’t take it to mean that when presented with a mathematical model that seems to produce sketchy results that we should accept it unquestioningly, without taking the time to understand the assumptions behind it or its intended purpose. Indeed, we’ve been told to shut and multiple for sleeping beauty before and that came to a different conclusion.
What is Neal actually doing
Unfortunately, Ksavnhorn’s post jumps straight into the maths and doesn’t provide any explanation of what is going on. This makes it somewhat harder to critique, as it means that you don’t just need the ability to follow the maths, but also the ability to figure out the actual motivation behind all of this.
Neal wants us the condition on all information, including the apparently random experiences that Sleeping Beauty will undergo before they answer the interview question. This information seems irrelevant, but Neal argues that if it were irrelevant that it wouldn’t affect the calculation. If, contrary to expectations, it actually does, then Neal would suggest that we were wrong about its irrelevance. On the other hand, I would suggest that this is a massive red flag that suggests that we don’t actually know what it is that we are calculating, as we will see in a moment.
Let S refer to experiencing a particular sequence of sensations, starting with waking up and ending with being interviewed. Neal’s strategy is to calculate the probability of S given heads and the probability of S given tails. If we are woken twice, we only need to observe S on at least one of the days for it to count and if we observe S on both days, it still only counts once. Neal uses Bayes’ Rule on the intermediate probabilities to discover the probability of heads vs. tails. Notice how incredibly simple this process was to describe in words. This is one of those situations where preventing the formalisms without an intuitive explanation of what is happening makes it much harder to understand.
Calculations
I aim to show that intermediate probabilities are mostly irrelevant. In order to do so, we will assume that there are three bits of information after awakening and before the interview (this implies 8 possibilities). Let’s suppose Sleeping Beauty awakes and then observes the sequences 111. Neal notes that in the heads case, the chance of Sleeping Beauty observing this at least once is 1⁄8. In the tails case, assuming independence, we get a probability 1/8+1/8-1/64 = 15⁄64 (or almost 1⁄4). As the number of possibilities approaches infinity, the ratio of the two probabilities approaches 1:2, which leads to slightly more than a 1⁄3 chance of heads after we perform the Bayesian update (see Ksvanhorn’s post for the maths). If we ensure that the experience stream of the second awakening never matches that of the first, we get a 2⁄8 chance of observing 111 in one of the two streams, which eventually leads to exactly a 1⁄3 chance. One the other hand, if the experience stream is always the same both before and after, we get a 1⁄8 chance of observing 111. This provides a ratio of 1:1, which leads to a 50% chance of heads.
All this maths is correct, but why do we care about these odds? It is indeed true that if you had pre-committed at the start to guess if and only if you experienced the sequence 111, then the odds of the coin being heads would be as above. This would be also true if you made the same commitment for 000 instead; or 100; or any sequence.
However, let’s suppose you picked two sequences 000 and 001 and pre-committed to guess if you saw either of those sequences. Then the odds of guessing if tails occurs and the observations are independent would become: 1/4+1/4-1/16 = 7⁄16. This would lead the probability ratio to become 4⁄7. Now, the other two probabilities (always different, always the same) remain the same, but the point is that the probability of heads depends on the number of sequences you pre-commit to guess. If you pre-committed to guess regardless of the sequence, then the probability becomes 1⁄2.
Moving back to the original problem, suppose you wake up and observe 111. Why do we care about the odds of heads if you had pre-committed to only answering on observing 111, given that you didn’t pre-commit to this at all? Further, there’s no reason why you couldn’t, for example decide in advance to ignore the last bit and pre-commit if the first two were 11. Why must you pre-commit utilising all of the available randomness? Being able to manipulate your effective odds in this way by making such pre-commitments is a neat trick, but it doesn’t directly answer the question asked. Sure Ksvanhorn was able to massage this probability to produce the correct betting decisions, but both the halfer and thirder solutions can achieve this much easier.
Updating on a random bit of information
@travisrm89 wrote:
How can receiving a random bit cause Beauty to update her probability, as in the case where Beauty is an AI? If Beauty already knows that she will update her probability no matter what bit she receives, then shouldn’t she already update her probability before receiving the bit?
Ksvanhorn responds by pointing out that this assumes that the probabilities add to one, while we are considering the probability of observing a particular sequence at least once, so these probabilities overlap.
This doesn’t really clarify what is going on, but I think that we can clarify this by first looking at the following classical probability problem:
A man has two sons. What is the chance that both of them are born on the same day if at least one of them is born on a Tuesday?
(Clarifying in response to comments: the Tuesday problem is ambiguous and the answer is either 1⁄13 or 1⁄7 depending on interpretation. I’m not disputing this)
Most people expect the answer to be 1⁄7, but the usual answer is that 13⁄49 possibilities have at least one born on a Tuesday and 1⁄49 has both born on Tuesday, so the chance in 1⁄13. Notice that if we had been told, for example, that one of them was born on a Wednesday we would have updated to 1⁄13 as well. So our odds can always update in the same way on a random piece of information if the possibilities referred to aren’t exclusive as Ksvanhorn claims.
However, consider the following similar problem:
A man has two sons. We ask one of them at random which day they were born and they tell us Tuesday. What is the chance that they are both born on the same day?
Here the answer is 1⁄7 as we’ve been given no information about when the other child was born. When Sleeping Beauty wakes up and observes a sequence, they are learning that this sequence occurs on a on a random day out of those days when they are awake. This probability is 1/n where n is the number of possibilities. This is distinct from learning that the sequence occurs in at least one wakeup just like learning a random child is born on a Tuesday is different from learning that at least one child was born on a Tuesday. So Ksvanhorn has calculated the wrong thing.
What does this mean?
Perhaps this still indicates a limitation on the thirders’ attempts to define a notion of subjective probability? If we define probability in terms of bets, then this effect is mostly irrelevant. It only occurs when multiple guesses are collapsed down to one guess, but how often will a situation involving completely isolated situations be scored in a combined manner?
On the other hand, what does this mean for the halvers’ notion of probability where we normalise multiple guesses? Well, it shows that we can manipulate the effective probability of heads vs. tails via only guessing in particular circumstances, however, the effect is purely a result of controlling how many times we guess correctly on both days so that they only count once. Further, there are many situations where we make the correct guess on Monday, then refuse to guess on Tuesday or vice versa.
These kinds of situations fit quite awkwardly into probability theory and it seems much more logical to consider handling them in the decision theory instead.
More on the 1⁄3 solution
Neal is correct to point out that epistemic probability theory doesn’t contain a concept of “now”, so we either need to eliminate it (such as by using indexicals) or utilitise an extension of standard probability theory. Neal is correct that most 1⁄3 answers skip over this work and that this work is necessary for a formal proof. I can imagine constructing a “consciousness-state centred” probability, which handles things like repeated awakenings or duplicates. I won’t attempt to do so in this post, but I believe that such a theory is worth pursuing.
Of course, finding a useful theory of probability that covers such situations wouldn’t mean that the answer would objectively be 1⁄3, just that there is a notion of probability where this is the answer.
Neal is also right to point out that instead of updating on new information, the thirders are tossing out one model and utilising a new model. However, if we constructed a consciousness-state centred probability it would be reasonable to update based on a change of which consciousness-states are considered possibilities.
More on the 1⁄2 solution
Standard probability theory doesn’t handle being asked multiple times (it doesn’t even handle indexicals). One of the easiest ways to support this is to normalise multiple queries. For example, if we ask you twice whether you see a cat and you expect to see one 1.2 times on average, we can normalise the probability of seeing a cat to being 0.6 every query by multiplying by 0.5. If we run the sleeping beauty problem twice, you should expect to see one head with a weighting of 1 and two tails with weightings of 0.5. This provides a 50% chance of heads and a 50% chance of tails for one flip. Obviously, it would be a bit of work proving that certain standard theorems still hold, but this is a much more logical way to extend probability theory than the manner proposed.
But beyond this, if we want to pre-commit to guess on observing particular sequences of experiences, the logical choice is to pre-commit to guess on all such sequences. This then leads to the answer of 1⁄2 chance of heads if we follow Neal and collapse multiple matches into one.
Again, none of this is objective, but it all comes down to how we choose to extend classical probability theory.
Is betting a red herring?
As good as If a tree falls on Sleeping Beauty is as an article, I agree with Neil that if we merely look at bets, we haven’t reached the root of the issue. When people propose using a particular betting scheme, that scheme didn’t come out of nowhere. The betting scheme was crafted to satisfy certain properties or axioms. These axioms are the root of the issue. Here, the conflict is between counting repeated queries only once or counting them separately. Once we’ve chosen which one of these we want to include with our other axioms, the betting scheme (or rather the set of consistent betting schemes) follows. So Ksvanhorn is correct that current solutions on Less Wrong haven’t dotted all of their i’s and crossed all of their t’s. Whether this matters depends on how much you care about certainty. Again, I won’t attempt to pursue this approach in this post.
Conclusion
We’ve seen that behind all of the maths, Neil is actually performing quite a simple operation and it has very little relation to anything that we are interested in. On the other hand, the critiques of current solutions are worth taking to heart. It doesn’t imply that these are necessarily wrong, just that they aren’t formal proofs. Overall, I believe that both 1⁄2 and 1⁄3 are valid answers depending on exactly what the question is, although I have not embarked on the quest of establishing a formal footing in this post.
- 19 Jun 2018 8:31 UTC; 14 points) 's comment on Anthropics made easy? by (
- 22 Jun 2018 5:50 UTC; 7 points) 's comment on Anthropics and Fermi by (
- Beauty and the Bets by 27 Mar 2024 6:17 UTC; 7 points) (
- 15 Jan 2019 21:26 UTC; 5 points) 's comment on Anthropics: Full Non-indexical Conditioning (FNC) is inconsistent by (
- 19 Jun 2018 5:12 UTC; 5 points) 's comment on Sleeping Beauty Resolved? by (
- 5 Aug 2023 9:42 UTC; 1 point) 's comment on Anthropical Motte and Bailey in two versions of Sleeping Beauty by (
- 9 Jul 2021 16:17 UTC; 1 point) 's comment on Practical anthropics summary by (
Ah, I see you’ve made the points about how the information about Tuesday was selected!
I’ve concluded that because of the problems in FNC, SIA, and SSA, that no anthropic probability theory works in the presence of duplicates: https://www.lesswrong.com/posts/iNi8bSYexYGn9kiRh/paradoxes-in-all-anthropic-probabilities
It’s all a question of decision theory, not probability.
https://www.lesswrong.com/posts/RcvyJjPQwimAeapNg/torture-vs-dust-vs-the-presumptuous-philosopher-anthropic
https://arxiv.org/abs/1110.6437
https://www.youtube.com/watch?v=aiGOGkBiWEo
I just thought I’d add a note in case anyone stumbles upon this thread: Stuart has actually now changed his views on anthropic probabilities as detailed here.
I raised this objection on Ksvanhorn’s initial post, though it came rather late, so I’m not sure if anyone saw it. You’ll have to forgive me in advance, as most of this math is beyond my current level of familiarity.
In the original post, Ksvanhorn states:
My understanding is that Neal’s solution assumes that the sets of possible experience streams for Sleeping Beauty before answering the question are identical on both Monday and Tuesday. Furthermore, this “stream of experiences” includes events of arbitrarily small significance (one example given was the movement of a fly on the wall).
If my understanding is correct (and given that I don’t understand most of the math involved, it’s certainly possible that it’s not), it seems to be trivial to disprove this assumption. Through the course of the experiment, time passes. Sleeping Beauty ages. If something as insignificant as a fly on the wall or a change in heart rate is relevant enough to be included in these calculations, then Sleeping Beauty’s aging over the course of two should also be.
She cannot be two days older on Monday, and she cannot be one day older on Tuesday. All of her internal bodily functions proceed as normal. Her fingernails grow. Her hair grows. Any kind of wound condition that required healing will have progressed. Does she eat? Go to the bathroom? If we can condition on a nebulous “everything” that can include basically insignificant differences in experience, I can think of any number of much less insignificant differences that are affected by the passage of time, and thus cannot be identical on both Monday and Tuesday.
No, I very definitely do NOT assume that Beauty’s experiences are identical on Monday and Tuesday. I think one should solve the Sleeping Beauty problem with the ONLY fantastical aspect being the memory erasure. In every other respect, Beauty is a normal human being. If you then want to make various fantastic assumptions, go ahead, but thinking about those fantastic versions of the problem without having settled what the answer is in the usual version is unwise.
Just to clarify… ageing by one day may well be one reason Beauty’s experiences are different on Tuesday than on Monday, but we assume that other variation swamps ageing effects, so that Beauty will not be able to tell that it is Tuesday on this basis.
I understand that you do not assume Beauty’s experiences are identical on Monday and Tuesday. Rather, my understanding is that you assume that “the set of things it is possible for Beauty to experience on Monday” is identical to “the set of things it is possible for Beauty to experience on Tuesday”. Is my understanding incorrect?
Ah! I see I misread what you wrote. As you point out, it is implausible in real life that the set of possible experiences on Monday is exactly the same as the set of possible experiences on Tuesday, or at least it’s implausible that the probability distributions over possible experiences on Monday and on Tuesday are exactly the same. I think it would be fine to assume for a thought experiment that they are the same, however. The reason it would be fine is that you could also not assume they are the same, but just that they are very similar, which is indeed plausible, and the result would be that at most Beauty will obtain some small amount of information about whether it is Monday or Tuesday from what her experiences are, which will change her probability of the coin having landed Heads by only a small amount. Similarly, we don’t have to assume PERFECT memory erasure. And we don’t have to assume (as we usually do) that Beauty has exactly ZERO probability of dying after Monday and before she might have been woken on Tuesday. Etc, etc.
You write: A man has two sons. What is the chance that both of them are born on the same day if at least one of them is born on a Tuesday?
Most people expect the answer to be 1⁄7, but the usual answer is that 13⁄49 possibilities have at least one born on a Tuesday and 1⁄49 has both born on Tuesday, so the chance in 1⁄13. Notice that if we had been told, for example, that one of them was born on a Wednesday we would have updated to 1⁄13 as well. So our odds can always update in the same way on a random piece of information if the possibilities referred to aren’t exclusive as Ksvanhorn claims.
I don’t know what the purpose of your bringing this up is, but your calculation is in any case incorrect. It is necessary to model the process that leads to our being told “at least one was born on Tuesday”, or “at least one was born on Wednesday”, etc. The simplest model would be that someone will definitely tell us one of these seven statements, choosing between valid statements with equal probabilities if more than one such statement is true. With this model, the probability of them being born on the same day is 1⁄7, regardless of what statement you are told. There are 13 possibilities with non-zero probabilities after hearing such a statement, but the possibility in which they are born on the same day has twice the probability of the others, since the others might have resulted in a different statement.
You’ll get an answer of 1⁄13, if you assume a model in which someone precommits to telling you whether the statement “at least one was born on Tuesday” is true or false, before they find out the answer, and they later say it is true.
I don’t think we’re in disagreement here. The reason why I said the “usual answer” is 1⁄13 instead of writing the “answer” is 1⁄13 is that there are disputes about what the question is asking as you’ve pointed out. I also noted the 1⁄7 directly below. But I definitely could have been clearer—the answer can be 1⁄7 or 1⁄13 depending on the interpretation.
As I said, I’m not sure what point you’re trying to make, but if updating from 1⁄7 to 1⁄13 on any of the statements “at least one was born on Tuesday”, “at least one was born on Wednesday”, etc. is part of the point, then I don’t see any model of what you are told for which that is the case.
Maybe this will make it easier. Suppose you meet the first son on Monday and then the second on Tuesday. Your memory is wiped in between. You wake up not knowing the day and the child tells you that they were born on a Tuesday. What are the odds that both were born on Tuesday?
If you pre-committed to only guessing when you met a boy born on a Tuesday, then on average we’d expect you to guess at least once 13⁄49 times and 1⁄49 would have both born on a Tuesday. My point is that this would be an extremely weird way to behave and the proposal of updating on all relevant information is similarly weird.
Suppose you wake up and immediately observe 111, it doesn’t make sense to calculate the odds of heads given:
a) The following events occurred at least once: Wake up, 1, 1, 1
Instead of the odds of heads given:
b) A random wakeup was: Wake up, 1, 1, 1
The only time you’d care about a) is if you pre-committed only guess upon seeing that particular sequence, which would be just as weird as above.
If the memory wipe is the only fantastic aspect of this situation, then when the child you see when you wake says they were born on Tuesday (and I assume you know that both children will always say what day they were born on after you wake up), you should consider the probability that the other was also born on Tuesday to be 1⁄7. The existence of another wakening, which will of course be different in many respects from this one (e.g., the location of dust specs on the mirror in the room), is irrelevant, since you can’t remember it (or it hasn’t occurred yet).
I’ve no idea what you mean by “guessing only when you met a boy born on Tuesday”. Guessing what? Or do you mean you are precommitted to not thinking about what the probability of both being born on the same day is if the boy doesn’t say Tuesday? (Could you even do that? I assume you’ve heard the joke about the mother who promises a child a cookie if he doesn’t think of elephants in the next ten minutes...) I think you may be following some strange version of probability or decision theory that I’ve never heard of....
“Or do you mean you are precommitted to not thinking about what the probability of both being born on the same day is if the boy doesn’t say Tuesday?”—exactly. In these kinds of scenarios we need to define our reference class and then we calculate the probability for someone in this class. For example, in anthropic problems there’s often debate about whether our reference class should include all sentient beings or all humans or all humans with a certain level of intellectual ability. Similarly, the question here is whether our reference class is all agents who encounter a boy born on a Tuesday on at least one day or all agents who encounter a boy. I see the second as much more useful, unless you’ll only be offered an option if at least one boy was born on a Tuesday.
“You should consider the probability that the other was also born on Tuesday to be 1/7”—exactly!
My point is that you only get the 1⁄13 answer when you pre-commit to guessing when you wake up and the boy tells you that you were born on Tuesday. Further this involves collapsing guessing twice as though you’d only guessed once and abstaining in the majority of cases when you wake up the second time (or the abstaining on Monday if you guessed on Tuesday). The number of scenarios where you care about this is vanishingly small. Similarly, we shouldn’t be conditioning on the sequences you observe when you wake up.
You write: However, let’s suppose you picked two sequences 000 and 001 and pre-committed to bet if you saw either of those sequences. Then the odds of betting if tails occurs and the observations are independent would become: 1/4+1/4-1/16 = 7⁄16. This would lead the probability ratio to become 4⁄7. Now, the other two probabilities (always different, always the same) remain the same, but the point is that the probability of heads depends on the number of sequences you pre-commit to guess. If you pre-committed to guess for any sequences, then the probability becomes 1⁄2.
This makes no sense to me. What do you mean by “the odds of betting”? Betting on what? And why are we trying to assign probabilities to Beauty making bets? As a rational agent, she usually makes the correct bets, rather than randomly choosing a bet. And whatever Beauty is betting on, what is the setup regarding what happens if she makes different betting decisions on Monday and Tuesday?
Part of that was a typo I’ve now fixed—I meant to say “guess” instead of “bet”. She is making a guess related to whether the coin came up heads or tails; I haven’t introduced a payoff scheme.
I wasn’t saying that she randomly chooses a bet/guess—just that if she only conditionally guesses we can calculate the odds of her choosing to guess. For example, suppose you toss two coins and I pre-commit to only guess if both will be heads if the first one comes up heads. Then I only guess in half the cases.
I’m assuming that beauty is following a deterministic guessing scheme so this issue doesn’t come up. Instead of thinking about Beauty as a person, we can just as easily make Beauty a computer program.
Also, I edited my comment to now say, “If you pre-committed to guess for regardless of the sequence, then the probability becomes 1/2”
I’m still baffled. Why aren’t we just talking about what probabilities Beauty assigns to various possibilities, at various times? Beauty has nothing much else to do, she can afford to think about what the probabilities should be every time, not just when she observes 1, 1,1, or a coin comes up Heads, or whatever. I suspect that you think her “guessing” (why that word, rather than “assigning a probability”?) only some of the time somehow matters, but I don’t see how...
I’d rather that Beautify not be a computer program. As my original comment discusses, that is not the usual Sleeping Beauty problem. If your answer depends on Beauty being a program, not a person, then it is not an answer to the usual problem.
The point is to clarify what exactly it is that Ksvanhorn calculated. If we had decided at the start that we only cared about cases where Sleeping Beauty experienced <WAKE UP, 1, 1, 1> at least once and we wanted to calculate the probability that the coin would come up heads within this particular scope, then the maths would proceed as per Ksvanhorn’s calculations. Do you disagree?
“Beauty has nothing much else to do, she can afford to think about what the probabilities should be every time”—Well, if she pre-commits to guess in any scope where she wakes up and then experiences any stream of events, then an interview, the probability would end up being 1⁄2.
I explained on another comment that this is just about picking the reference class, which I believe to be necessary for solving anthropic problems: “For example, in anthropic problems there’s often debate about whether our reference class should include all sentient beings or all humans or all humans with a certain level of intellectual ability. Similarly, the question here is whether our reference class is all agents who encounter a boy born on a Tuesday on at least one day or all agents who encounter a boy. I see the second as much more useful, unless you’ll only be offered an option if at least one boy was born on a Tuesday.”
Is the reference class all agents or all agents who experience <WAKE UP, 1, 1, 1>?
Well, I think the whole “reference class” thing is a mistake. By using FNC, one can see that all non-fantastical problems of everyday life that might appear to involve selection effects for which a “reference class” is needed can in fact be solved correctly using standard probability theory, if one doesn’t ignore any evidence. So it’s only the fantastical problems where they might appear useful. But given the fatal flaw that the exact reference class matters, but there is no basis for chosing a particular reference class, the whole concept is of no use for fantastical problems either.
FNC?
“But given the fatal flaw that the exact reference class matters, but there is no basis for chosing a particular reference class, the whole concept is of no use for fantastical problems either”—well, I plan to write up a post on this soon, but I don’t think that the reference class is as complex as people think for most cases. If you’re deciding whether to take action A, but you need to calculate the probability accounting for anthropic effects, you just consider the population who can take action A.
Well, I guess I’ll have to wait for the details, but off-hand it doesn’t seem that this will work. If action A is “have another child”, and the issue is that you don’t want to do that if the child is going to die if the Earth is destroyed soon in a cataclysm, then the action A is one that can be taken by a wide variety of organisms past and present going back hundreds of millions of years. But many of these you would probably not regard as having an appropriate level of sentience, and some of them that you might regard as sentient seem so different from humans that including them in the reference class seems bizarre. Any sort of line drawn will necessarily be vague, leading to vagueness in probabilities, perhaps by factors of ten or more.
FNC = Full Non-indexical Conditioning, the method I advocate in my paper.
I’ll split my comments on this into muliple replies, since they address somewhat unconnected issues.
Here, some meta-comments. First, it is crucial to be clear on what Sleeping Beauty problem is being solved. What I take to be the USUAL problem is one that is only mildly fantastic—supposing that there is a perfect (or at least very good) memory erasing drug, that ensures that if Beauty is woken on Tuesday, she will not have memories of a Monday wakening that would allow her to deduce that it must be Tuesday. That is the ONLY fantastic aspect in this version of the problem. Beauty is otherwise a normal human, who has normal human experiences whenever she is awake, which include a huge number of bits of both sensory information and internal perceptions of her state of mind, which are overwhelmingly unlikely to be the same for a Tuesday awakening as for a Monday awakening. Furthermore, although Beauty is assumed to be a normally rational and intelligent person, there is no guarantee that she will make the correct decision in all circumstances, and in particular there is at least some small probability that she will decide differently on Monday and Tuesday, despite there being no rational grounds for making different decisions. When she makes a decision, she is NOT deciding for both days, only one.
Some people may be interested in a more fantastic version of the problem, in which Beauty’s experiences are somehow guaranteed to be identical on Monday and Tuesday, or in which she only gets two bits of sensory input on either day. But those who are interested in these versions should FIRST be interested in what the answer is to the usual version. If the answer is different for the more fantastic versions, that is surely of central importance when trying to learn anything from such fantastic thought experiments. And for the usual, only-mildly-fantastic version we should be able to reach consensus on the answer (and the answer’s justification), since it is only slightly divorced from the non-fastastic probability problems that people really do solve in everyday life (correctly, if they are competent).
As you know, I think the answer to the usual Sleeping Beauty problem is that the probability of Heads is 1⁄3. Answers to various decision problems then follow from this (I don’t see anything unusual about how decision theory relates to probability here, though you do have to be careful not to make mistakes). There are three ways I might be wrong about this. One is that the probability is not 1⁄3. Another is that the probability is 1⁄3, but my argument for this answer is not valid. The third is that the probability is 1⁄3, but normal decision theory based on this probability is invalid (contrary to my belief).
In my paper (see http://www.cs.utoronto.ca/~radford/anth.abstract.html for the partially revised version), I offer multiple arguments for 1/3, and for the decisions that would normally follow from that, some of which do not directly relate to my FNC justification for 1/3. So separating “1/3” from “the FNC argument for 1/3″ seems important.
One auxiliary argument that is related to FNC is my Sailor’s Child problem. This is a completely non-fantatical analogue of Sleeping Beauty. I think it is clear that the answer for it is 1⁄3. Do you agree that the answer to the Sailor’s Child problem is 1/3? Do you agree that it is a valid analogue of the usual Sleeping Beauty problem? If the answer to both of these is “yes”, then you should agree that the answer to the usual Sleeping Beauty problem is 1⁄3, without any equivocation about it all depending on how we choose to extend probability theory, or whatever.
Finally, please note that in your post you attribute various beliefs and arguments to me that I do not myself always recognize as mine.
“Some people may be interested in a more fantastic version of the problem, in which Beauty’s experiences are somehow guaranteed to be identical on Monday and Tuesday, or in which she only gets two bits of sensory input on either day. But those who are interested in these versions should FIRST be interested in what the answer is to the usual version”—If we make Sleeping Beauty a computer program rather than a human, then the versions you describe as more fantastical, cease to be so. We can further ensure that Sleeping Beauty will behave deterministically.
But even if it is “fantastical”, I don’t think there is necessarily anything wrong with focusing on this first. I focused on three bits of information because it makes the maths easier to follow and clarifies what is happening in the “non-fantastical” case. In particular, it becomes apparent that the probability here isn’t a probability that we particularly care about.
My position is very similar to Ata’s. I don’t believe that the term “probability” is completely unambiguous once we start including weird scenarios that fall outside the scope which standard probability was intended to address. My suspicion is that both the halfers and thirders can construct coherent mathematical systems so that they are mostly just talking past each other, though I will acknowledge that the current solutions posted on LW haven’t been completely formally constructed. I’m just disputing a) the full non-indexical conditioning solution and b) that the 1⁄3 solution necessarily deserves precedence.
Following on from above, I don’t agree that the probability for Sailor’s child is necessarily 1⁄3. It depends on whether you think this effect should be handled in the probability or the decision theory. I’ll likely post about this at some point, but I need to read more of the literature before I attempt this (or decide to just link to someone else’s work because I don’t have anything further to add).
“Finally, please note that in your post you attribute various beliefs and arguments to me that I do not myself always recognize as mine”—Sorry for any mixups here. I’m happy to edit my post if you want me to correct anything specific.
You write: My position is very similar to Ata’s. I don’t believe that the term “probability” is completely unambiguous once we start including weird scenarios that fall outside the scope which standard probability was intended to address.
The usual Sleeping Beauty problem is only mildly fantastic, and does NOT fall outside the scope which standard probability theory addresses. Ata argues that probabilities are not meaningful if they cannot be used for a decision problem, which I agree with. But Ata’s argument that Sleeping Beauty is a situation where this is an issue seems to be based on a simple mistake.
Ata gives two decision scenarios, the first being
Each interview consists of Sleeping Beauty guessing whether the coin came up heads or tails, and being given a dollar if she was correct. After the experiment, she will keep all of her aggregate winnings.
In this situation, Beauty makes the correct decision if she gives probability 1⁄3 to Heads, and hence guesses Tails. (By making the payoff different for correct guesses of Heads vs. Tails, it would be possible to set up scenarios that make clear exactly what probability of Heads Beauty should be using.)
The second scenario is
Each interview consists of Sleeping Beauty guessing whether the coin came up heads or tails. After the experiment, she will be given a dollar if she was correct on Monday.
In this scenario, it makes no difference what her guess is, which Ata says corresponds to a probability of 1⁄2 for Heads. But this is simply a mistake. To be indifferent regarding what to guess in this scenario, Beauty needs to assign a probability of 1⁄3 to Heads. She will then see three equally likely possibilities:
Heads and it’s Monday
Tails and it’s Monday
Tails and it’s Tuesday
The differences in payoff for guessing Heads vs. Tails for these possibilities are +1, −1, and 0. Taking the expectation with respect to the equal probabilities for each gives 0, so Beauty is indifferent. In contrast, if Beauty assigns probability 1⁄2 to Heads, and hence probability 1⁄4 to each of the other possibilities, the expected difference in payoff is +1/4, so she will prefer to guess Heads. By using different payoffs for correct guesses of Heads vs. Tails, it is easy to construct an “only Monday counts” scenario in which Beauty makes a sub-optimal decision if she assigns any probability other than 1⁄3 to Heads. See also my comment on an essentially similar variation at https://www.lesswrong.com/posts/u7kSTyiWFHxDXrmQT/sleeping-beauty-resolved
I think that 1⁄3 probablity for Heads in fact leads to the correct decision with any betting scheme in the usual Sleeping Beauty problem. There is no difficulty in applying standard probability and decision theory. 1⁄3 is simply the correct answer. Other answers are the result of mistakes in reasoning. Perhaps something more strange happens in more fantastic versions of Sleeping Beauty, but when the only fantastic aspect is memory erasure, the answer is quite definitely 1⁄3.
That the answer is 1⁄3 is even more clear for the Sailor’s Child problem. But you say: I don’t agree that the probability for Sailor’s child is necessarily 1⁄3. It depends on whether you think this effect should be handled in the probability or the decision theory. I would like to emphasize again that the Sailor’s Child problem is completely non-fantastical. It could really happen. It involves NOTHING that should cause any problems in reasoning out the answer using standard methods. If standard probability and decision theory can’t be unambiguously applied to this problem, then they are flawed tools, and the many practical applications of probability and decision theory in fields ranging from statistics to error-correcting codes would be suspect. Saying that the answer to the Sailor’s Child problem depends on some sort of subjective choice of whether “this effect should be handled in the probability or the decision theory” is not a reasonable position to take.
“I think that 1⁄3 probability for Heads in fact leads to the correct decision with any betting scheme in the usual Sleeping Beauty problem”—As does 1⁄2 probability with decisions weighted by number of repeats.
“In contrast, if Beauty assigns probability 1⁄2 to Heads, and hence probability 1⁄4 to each of the other possibilities, the expected difference in payoff is +1/4, so she will prefer to guess Heads”—Actually the chance of occurring(<Tails and it’s Monday>) in the half solution is 1⁄2, as is the occurring(<Tails and it’s Tuesday>) ⇒ expected value of 0. The probabilities will only be 1⁄4 if when two possibilities overlap, we randomly choose one to be “selected”. So it is selected(<Tails and it’s Monday>) that is 1⁄4. We get to the equivalent occurring value by doubling.
“I would like to emphasize again that the Sailor’s Child problem is completely non-fantastical. It could really happen”—Whether or not it is fantastical is irrelevant, standard probability doesn’t support indexicals. That doesn’t make it a flawed tool as we can usually substitute in exact values to avoid such issues. Even when that fails, there is still the possibility to extent it.
Each time Beauty guesses, it is either Monday or Tuesday. She doesn’t know for sure which, but that’s only because she has decided to follow the rules of the game.. If instead she takes out her ax, smashes a hole in the wall of her room, goes outside, and asks a passerby what day of the week it is, she will find out whether it is Monday or Tuesday. Ordinary views about the reality of the physical world say she should regard it as being either Monday or Tuesday regardless of whether she actually knows which it is. For each decision, there are no “repeats”. She either wins a dollar as a result of that decision or she does not.
This should all be entirely obvious. That it is not obvious to you indicates that you are insisting on addressing only some fantastic version of the problem, in which it can somehow be both Monday and Tuesday at the same time, or something like that. Why are you so reluctant to figure out the answer to the usual, only mildly-fantastic version? Don’t you think that might be of some interest?
Similarly, you seem to be inexplicably reluctant to admit that the answer for the Sailors Child problem is 1⁄3. Really, unless I’ve just made some silly mistake in calculation (which I highly doubt), the answer is 1⁄3. Your views regarding indexicals are not relevant. The Sailor’s Child problem is of the same sort as are solved every day in numerous practical applications of probability, The standard tools apply. They give the answer 1⁄3.
I was using “repeats” to simply mean that she is interviewed twice.
“If instead she takes out her ax, smashes a hole in the wall of her room, goes outside, and asks a passerby what day of the week it is, she will find out whether it is Monday or Tuesday”—yes, but these possibilities aren’t exclusive. She may smash a hole in the wall and discover in is Monday, then do the same thing and discover that it is Tuesday when she is memory wiped. I’ll assume that we want our probabilities to sum to 1 (though I can imagine someone responding to the possibility of multiple interviews by allowing them to be more than 1). In that case we have to make the cases exclusive; one way to do that is to selected(<Tails and it’s Monday>) instead of occurring(<Tails and it’s Monday>). Otherwise, we can follow the thirders. But neither way is necessarily wrong.
“That it is not obvious to you indicates that you are insisting on addressing only some fantastic version of the problem, in which it can somehow be both Monday and Tuesday at the same time, or something like that”—no, I’m just saying that occurs(<Tails and it’s Monday>) overlaps with occurs(<Tails and it’s Tuesday>). Admittedly, I previously wrote “occurring” instead of “occurs” which may have confused the matter. Here “occurs” means happens at some point—whether past, present or future.
“The Sailor’s Child problem is of the same sort as are solved every day in numerous practical applications of probability”—sure, but that doesn’t mean that it falls into the scope of standard probability theory without any translation. How are you removing the indexicals?
But the fact that she is interviewed twice is of no relevance to her calculations regarding what to guess in one of the interviews, since her payoffs from guessing in the two interviews are simply added together. The decision problems for the two interviews can be solved separately; there is no interaction. One should not rescale anything according to the number of repetitions.
When she is making a decision, it is either Monday or Tuesday, even though she doesn’t know which, and even though she doesn’t know whether she will be interviewed once or twice. There is nothing subtle going on here. It is no different from anybody else making a decision when they don’t know what day of the week it is, and when they aren’t sure whether they will face another similar decision sometime in the future, and when they may have forgotten whether or not they made a similar decision sometime in the past. Not knowing the day of the week, not remembering exactly what you did in the past, and not knowing what you will do in the future are totally normal human experiences, which are handled perfectly well by standard reasoning processes.
The probabilities I assign to various possibilities on one day when added to the probabilities I assign to various possibilities on another day certainly do not have to add up to one. Indeed, they have to add up to two.
The event of Beauty being woken on Monday after the coin lands Tails and the event of Beauty being woken on Tuesday after the coin lands Tails can certainly both occur. It’s totally typical in probability problems that more than one event occurs. This is of course handled with no problem in the formalism of probability.
If the occurrence of an indexical in a probability problem makes standard probability theory inapplicable, then it is inapplicable to virtually all real problems. Consider a doctor advising a cancer patient. The doctor tells the patient that if they receive no treatment, their probability of survival is 10%, but if they undergo chemotherapy, their probability of survival is 90%, although there will be some moderately unpleasant side effects. The patient reasons as follows: Those may be valid probability statements from the doctor’s point of view, but from MY point of view they are invalid, since I’m interested in the probability that I will survive, and that’s a statement with an indexical, for which probability theory is inapplicable. So I might as well decline the chemotherapy and avoid its unpleasant side effects.
In the Sailor’s Child problem, there is no doubt that if the child consulted a probabilist regarding the chances that they have a sibling, the probabilist would advice them that the probability of them having a sibling is 2⁄3. Are you saying that they should ignore this advice, since once they interpret it as being about THEM it is a statement with an indexical?
“But the fact that she is interviewed twice is of no relevance to her calculations regarding what to guess in one of the interviews, since her payoffs from guessing in the two interviews are simply added together”—Let’s suppose you can buy a coupon that pays $1 if the coin comes up heads and $0 otherwise. Generally, the fair price is p, where p is the probability of heads. However, suppose there’s a bug in the system and you will get charged twice if the coin comes up tails. Then the fair price (c) can be calculated as follows:
Find c such that: Expected expenditure = expected value of coupon
c(p + 2(1-p)) = p
c(2-p)= p
c = p/(2-p)
If p=1/2, then c=1/3
What’s wrong with using this system to translate between p and c so that we can figure out how to bet? The exact same system works for sleeping beauty.
“The probabilities I assign to various possibilities on one day when added to the probabilities I assign to various possibilities on another day certainly do not have to add up to one”—I was claiming that there are two options for how to extend probability to cover these situations assuming we want to make the events non-overlapping. One is to allow probabilities more than one (ie. your chance of being interviewed is 1.5 and your chance of experiencing each of the three states is 0.5).
Alternatively, you can maintain sum of probabilities being one by asking about the probability of events that are exclusive. The easiest way to do this in the standard sleeping beauty is to randomly choose only one interview to “count” in the case where there are multiple interviews. This gives the following probabilities: heads 0.5, tails + Monday selected = 0.25, tails + Tuesday selected = 0.25. The question isn’t so much whether you can construct this formalism, but whether this it is something that we care about.
“Those may be valid probability statements from the doctor’s point of view, but from MY point of view they are invalid, since I’m interested in the probability that I will survive”—If you are the patient, we can remove the indexical by asking about whether Radford Neal will survive.
Let’s suppose you can buy a coupon that pays $1 if the coin comes up heads and $0 otherwise. Generally, the fair price is p, where p is the probability of heads. However, suppose there’s a bug in the system and you will get charged twice if the coin comes up tails.
In the scenarios that Ata describes, Beauty simply gets paid for guessing correctly. She does not have to pay anything. More generally, in the usual Sleeping Beauty problem where the only fantastic feature is memory erasure, every decision that Beauty takes is a decision for one point in time only. If she is woken twice, she make two separate decisions. She likely makes the same decision each time, since she has no rational basis for deciding differently, but they are nevertheless separate decisions, each of which may or may not result in a payoff. There is no need to combine these decisions into one decision, in which Beauty is deciding for more than one point in time. That just introduces totally unnecessary confusion, as well as being contrary to what actually happens.
The easiest way to do this in the standard sleeping beauty is to randomly choose only one interview to “count” in the case where there are multiple interviews.
That’s close to Ata’s second scenario, in which Ata incorrectly concludes that Beauty should assign probability 1⁄2 to Heads. It is of course a different decision problem than when payoffs are added for the two days, but the correct result is again obtained when Beauty considers the probability of Heads to be 1⁄3, if she applies decision theory correctly. The choice of decision problem has no effect on how the probabilities should be calculated.
If you are the patient, we can remove the indexical by asking about whether Radford Neal will survive.
Yes indeed. And this can also be done for the Sailor’s Child problem, giving the result that the probability of Heads (no sibling) is 1⁄3.
a) Regardless of how sleeping beauty makes their decision, we can model it as an algorithm decided ahead of time. If part of the decision is random, we can program that in too. So we can assume they make the same meta-level decision, so have the same expected pay-off for both interviews.
b) I don’t follow the argument here? You seem to just be assuming that I am wrong?
c) We can’t just say, “Radford Neal” for sailors child without defining who will have that name.Is it one particular mother who will call their child Radford Neal if they have one? Or is a random child assigned that name?
a) But Beauty is actually a human being. If your argument depends on replacing Beauty by a computer program, then it does not apply to the usual Sleeping Beauty problem. Why are you so reluctant to actually address the usual, only-mildly-fantastic Sleeping Beauty problem?
In any case, why is it relevant that she has the same expected payoff for both interviews (which will indeed likely be the case, since she is likely to make the same decision)? Lots of people make various decisions at various times that happen to have the same expected payoff. That doesn’t magically make these several decisions be actually one decision.
b) If I understand your setup, if the coin lands Heads, Beauty gets one dollar if she correctly guesses on Monday, which is the only day she is woken. If the coin lands Tails, a ball is drawn from a bag with equal numbers of balls labeled “M” and “T”, and she gets a dollar if she makes a correct guesses on the day corresponding to the ball drawn, with her guess the other day being ignored. For simplicity, suppose that the ball is drawn (and then ignored) even if the coin lands Heads. There are then six possible situations of coin/ball/day when Beauty is considering her decision:
1) H M Monday
2) H T Monday
3) T M Monday
4) T T Monday
5) T M Tuesday
6) T T Tuesday
If Beauty is a Thirder, she considers all of these to be equally likely (probability 1⁄6 for each). In situations 4 and 5, her action has no effect, so we can ignore these in deciding on the best action. In situations 1 and 2, guessing Heads results in a dollar reward. In situations 3 and 6, guessing Tails results in a dollar reward. So she is indifferent to guessing Heads or Tails.
c) Really, can you actually not suppose that in the Sailor’s Child problem, which is explicitly designed to be a problem that could actually occur in real life, the child has not been given a name? And if so, do you also think that if the child gets cancer, as in the previous discussion, that they should refuse chemotherapy on the grounds that since their mother did not give them a name, they are unable to escape the inapplicability of probability theory to statements with indexicals? I’m starting to find it hard to believe that you are actually trying to understand this problem.
a) Even if sleeping beauty is a human, they are still a deterministic (or probabilistically deterministic) machine, so their responses in any scenario can be represented by an algorithm.
b) The halfer gets the same solution (indifference) too as 1), 2), 5) and 6) are all assigned a probability of 1⁄4; whilst 3) and 4) are ignored.
c) My point isn’t that the child might not have a name. My point is that in order to evaluate the statement: “Radford Neal has a half-sibling” we have to define the scheme in which someone comes to be called Radford Neal.
So, suppose the two potential mothers are Amy and Barbara. The first possibility is that Amy calls their child, if they have one, “Radford Neal”. However, if this is the case, it may come to pass that Amy doesn’t have a child so no-one is called Radford Neal and the reference fails. Alternatively, we might want to ensure that there is someone always called Radford Neal. If they only have one child, this is trivial, if there’s two, we could pick randomly. My point is that there isn’t a unique way of assigning the name, so I don’t know what scheme you want to use to replace the indexical.
a) You know, it has not actually been demonstrated that human consciousness can be mimicked by Turing-equivalent computer. In any case, the only role of mentioning this in your argument seems to be to push your thinking away from Beauty as a human towards a more abstract notion of what the problem is in which you can more easily engage in reasoning that would be obviously fallacious if your thoughts were anchored in reality.
b) Halfer reasoning is invalid, so it’s difficult to say how this invalid reasoning would be applied in the context of this decision problem. But if one takes the view that probabilities do not depend on what decision problem they will be used for, it isn’t possible for possibilities 5) and 6) to have probability 1⁄4 while possibilities 3) and 4) have probability zero. One can imagine, for example, that Beauty is told about the balls from the beginning, but is told about the reward for guessing correctly, and how the balls play a role in determining that reward, only later. Should she change her probabilities for the six possibilities simply because she has been told about this reward scheme? I suspect your answer will be yes, but that is simply absurd. It is totally contrary to normal reasoning, and if applied to practical problems would be disastrous. Remember! Beauty is human, not a computer program.
c) You are still refusing to approach the Sallor’s Child problem as one about real people, despite the fact that the problem has been deliberately designed so that it has no fantastic aspects and could indeed be about real people, as I have emphasized again and again. Suppose the child is considering searching for their possible sibling, but wants to know the probability that the sibling exist before deciding to spend lots of money on this search. The child consults you regarding what the probability of their having a sibling is. Do you really start by asking, “what process did your mother use in deciding what name to give you”? The question is obviously of no relevance whatsoever. It is also obvious that any philosophical debates about indexicals in probability statements are irrelevant—one way or another, people solve probability problems every day without being hamstrung by this issue. There is a real person standing in front of you asking “what is the probability that I have a sibling”. The answer to this question is 2⁄3. There is no doubt about this answer. It is correct. Really. That is the answer.
Thanks for taking the time to write all of these responses, but I suspect that we’ve become stuck. At some point I’ll write up some posts aimed at trying to argue for my position, rather than primarily aimed at addressing rebuttal and perhaps it will clear up some of these issues.
The intended scope is anything that you can reason about using classical propositional logic. And if you can’t reason about it using classical propositional logic, then there is still no ambiguity, because there are no probabilities.
I like an alternative version of the problem proposed by someone whose identity escapes me.
Thirty students are enrolled in an ethically dubious study. One of them is selected at random to be awakened on the 1st of the month. For the rest, a 6-sided die is rolled. On a result of “1”, each is awakened on a different day of the rest of the 30-day month. On any other result, they are not awakened at all. All participants who are woken are asked to guess whether the die rolled “1″ or not.
What should they guess? Does this differ from the case where the 30 students are replaced by a single participant given amnesic drugs?
The evidence is extremely strong that human minds are processes that occur in human brains. All known physical laws are Turing computable, and we have no hint of any sort of physical law that is not Turing computable. Since brains are physical systems, the previous two observations imply that it is highly likely that they can be simulated on a Turing-equivalent computer (given enough time and memory).
But regardless of that, the Sleeping Beauty problem is a question of epistemology, and the answer necessarily revolves around the information available to Beauty. None of this requires an actual human mind to be meaningful, and the required computations can be carried out by a simple machine. The only real question here is, what information does Beauty have available? Once we agree on that, the answer is determined.
No, that is not what probability theory tells us to do. Reference classes are a rough technique to try to come up with prior distributions. They are not part of probability theory per se, and they are problematic because often there is disagreement as to which is the correct reference class.
“They are not part of probability theory per se, and they are problematic because often there is disagreement as to which is the correct reference class”—I’ll write up a post on how to choose the correct reference class soon, but I want to wait a bit, because I’m worried that everyone on Less Wrong is all Sleeping Beauty’ed out. And yes, probability theory takes the set of possibilities as given, but that doesn’t eliminate the need for a justification for this choice.
Yes, in exactly the same sense that *any* mathematical / logical model needs some justification of why it corresponds to the system or phenomenon under consideration. As I’ve mentioned before, though, if you are able to express your background knowledge in propositional form, then your probabilities are uniquely determined by that collection of propositional formulas. So this reduces to the usual modeling question in any application of logic—does this set of propositional formulas appropriately express the relevant information I actually have available?
Yeah, but standard propositions don’t support indexicals, only “floating” observers, so why is this relevant?
Right here is your error. You are sneaking in an indexical here—Beauty doesn’t know whether “today” is Monday or Tuesday. As I discussed in detail in Part 2, indexicals are not part of classical logic. Either they are ambiguous, which means you don’t have a proposition at all, or the ambiguity can be resolved, which means you can restate your proposition in a form that doesn’t involve indexicals.
What you are proposing is equivalent to adding an extra binary variable d to the model, and replacing the observation R(y, Monday) or R(y, Tuesday) with R(y, d). That in turn is the same as randomly choosing ONE day on which to wake Beauty (in the Tails case) instead of waking her both times.
This kind of oversight is why I really insist on seeing an explicit model and an explicit statement (as a proposition expressible in the language of the original model) of what new information Beauty receives upon awakening.
“What you are proposing is equivalent to adding an extra binary variable d to the model, and replacing the observation R(y, Monday) or R(y, Tuesday) with R(y, d). That in turn is the same as randomly choosing ONE day on which to wake Beauty (in the Tails case) instead of waking her both times”—Yes, that is equivalent to what I’m proposing by saying that only one day “counts”. I’ll explain why this formalism is useful in my next post.
But randomly awakening Beauty on only one day is a different scenario than waking her both days. A priori you can’t just replace one with the other.
We care about these odds because the laws of probability tell us to use them. I have no idea what you mean by “precommitted at the start to guess if and only if...” I can’t make any sense of this or the following paragraph. What are you “guessing”? Regardless, this is a question of epistemology—what are the probabilities, given the information you have—and those probabilities have specific values regardless of whether you care about calculating them.
“We care about these odds because the laws of probability tell us to use them”—I’m not disputing your calculation, just explaining what you’ve actually calculated and why it isn’t relevant.
“Pre-committed at the start to guess if and only if...”—Given a scenario, we can calculate the probabilities when particular events occur. For example, if we have a deck of cards and we reveal cards one by one, we can ask about the probability that the next card is a king given that the previous card was a king. One way to describe this scenario would be to ask about the probability of guessing the correct card if you promise to guess the next card whenever you see a king and to not guess whenever you don’t. If you break this promise, then it may alter the chance of you guessing correctly. Is my use of language clear now?
You are simply assuming that what I’ve calculated is irrelevant. But the only way to know absolutely for sure whether it is irrelevant is to actually do the calculation! That is, if you have information X and Y, and you think Y is irrelevant to proposition A, the only way you can justify leaving out Y is if Pr(A | X and Y) = Pr(A | X). We often make informal arguments as to why this is so, but an actual calculation showing that, in fact, Pr(A | X and Y) != Pr(A | X) always trumps an informal argument that they should be equal.
Your “probability of guessing the correct card” presupposes some decision rule for choosing a particular card to guess. Given a particular decision rule, we could compute this probability, but it is something entirely different from “the probability that the card is a king”. If I assume that’s just bad wording, and that you’re actually talking about the frequency of heads when some condition occurs, well now you’re doing frequentist probabilities, and we were talking about *epistemic* probabilities.
I’m not using the word irrelevant in the sense of “Doesn’t affect the probability calculation”, I’m using it in the sense of, “Doesn’t correspond to something that we care about”.
Yeah, I could have made my language clearer in my second paragraph. I was talking about the “probability of guessing the correct card” for a particular guessing strategy. And the probability of the next card being a king over some set of situations corresponds to the probability that the strategy of always guessing “King” for the next card gives the correct solution.
Anyway, my point was that you can manipulate your probability of being correct by changing which situations are included inside this calculation.
This isn’t just Neal’s position. Jaynes argues the same in Probability Theory: The Logic of Science. I have never once encountered an academic book or paper that argued otherwise. The technical term for conditioning on less than all the information is “cherry-picking the evidence” :-).
But within what context? You can’t just take a formula or rule and apply it without understanding the assumptions it is reliant upon.
The context is *all* applications of probability theory. Look, when I tell you that A or not A is a rule of classical propositional logic, we don’t argue about the context or what assumptions we are relying on. That’s just a universal rule of classical logic. Ditto with conditioning on all the information you have. That’s just one of the rules of epistemic probability theory that *always* applies. The only time you are allowed to NOT condition on some piece of known information is if you would get the same answer whether or not you conditioned on it. When we leave known information Y out and say it is “irrelevant”, what that means is that Pr(A | Y and X) = Pr(A | X), where X is the rest of the information we’re using. If I can show that these probabilities are NOT the same, then I have proven that Y is, in fact, relevant.
“Look, when I tell you that A or not A is a rule of classical propositional logic, we don’t argue about the context or what assumptions we are relying on”—Actually, you get questions like, “This sentence is false”, which fall outside out classical propositional logic. This is why it is important to understand the limits which apply.
Ouch. I thought I was explaining what was going on.