garethrees comments on Updating, part 1: When can you change your mind? The binary model

garethrees 13 May 2010 19:51 UTC
3 points
The Sleeping Beauty problem and the other “paradoxes” of probability are problems that have been selected (in the evolutionary sense) because they contain psychological features that cause people’s reasoning to go wrong. People come up with puzzles and problems all the time, but the ones that gain prominence and endure are the ones that are discussed over and over again without resolution: Sleeping Beauty, Newcomb’s Box, the two-envelope problem.

So I think there’s something valuable to be learned from the fact that these problems are hard. Here are my own guesses about what makes the Sleeping Beauty problem so hard.

First, there’s ambiguity in the problem statement. It usually asks about your “credence”. What’s that? Well, if you’re a Bayesian reasoner, then “credence” probably means something like “subjective probability (of a hypothesis H given data D), defined by p(H|D) = p(D|H) p(H) / p(D)”. But some other reasoners take “credence” to mean something like “expected proportion of observations consistent with data D in which the hypothesis H was confirmed”.

In most problems these definitions give the same answer, so there’s normally no need to worry about the exact definition. But the Sleeping Beauty problem pushes a wedge between them: the Bayesians should answer ½ and the others ⅓. This can lead to endless argument between the factions if the underlying difference in definitions goes unnoticed.

Second, there’s a psychological feature that makes some Bayesian reasoners doubt their own calculation. (You can try saying “shut up and calculate” to these baffled reasoners but while that might get them the right answer, it won’t help them resolve their bafflement.) The problem somehow persuades some people to imagine themselves as an instance of Sleeping Beauty selected uniformly from the three instances {(heads,Monday), (tails,Monday), (tails,Tuesday)}. This appears to be a natural assumption that some reasoners are prepared to make, even though there’s no justification for it in the problem description.

Maybe it’s the principle of indifference gone wrong: the three instances are indistinguishable (to you) but that doesn’t mean the one you are experiencing was drawn from a uniform distribution.
- PhilGoetz 13 May 2010 20:36 UTC
  2 points
  Parent
  Most of what you said here has already been said, and rebutted, in the comments on the Sleeping Beauties post, and in the followup post by Jonathan Lee. It would be polite, and helpful, to address those rebuttals. Simply restating arguments, without acknowledging counterarguments, could be a big part of why we don’t seem to be getting anywhere.
  - garethrees 13 May 2010 21:24 UTC
    3 points
    Parent
    I did check both threads, and as far as I could see, nobody was making exactly this point. I’m sorry that I missed the comment in question: the threads were very long. If you can point me at it, and the rebuttal, then I can try to address it (or admit I’m wrong).
    
    (Even if I’m wrong about why the problem is hard, I think the rest of my comment stands: it’s a problem that’s been selected for discussion because it’s hard, so it might be productive to try to understand why it’s hard. Just as it helps to understand our biases, it helps to understand our errors.)
    - timtyler 13 May 2010 22:02 UTC
      1 point
      Parent
      Bayesians should not answer ½. Nobody should answer ½: that’s the wrong answer.
      
      If your interpretation of the word “credence” leads you to answer ½, you are fighting with the rest of the community over the definition of the concept of subjective probability.
      - Jack 14 May 2010 5:02 UTC
        1 point
        Parent
        How is this a constructive comment? You’re just stating your position again. We all already know your position. I can just as easily say:
        
        Bayesians should not answer ¹⁄₃. Nobody should answer 1/3: that’s the wrong answer.
        
        If your interpretation of the word “credence” leads you to answer ¹⁄₃, you are fighting with the rest of the community over the definition of the concept of subjective probability.
        
        If the entire scientific establishment is using subjective probability in a different way, by all means, show us! But don’t keep asserting it like it has been established. That isn’t productive.
        timtyler 14 May 2010 9:52 UTC
        0 points
        Parent
        The point of the comment was to express disapproval of the idea that scientists had multiple different conceptions of subjective probability—and that the Bayesian approach gave a different answer to other ones—and to highlight exactly where I differed from garethrees—mostly for his benefit.
      - LucasSloan 14 May 2010 5:07 UTC
        0 points
        Parent
        
        you are fighting with the rest of the community over the definition of the concept of subjective probability.
        
        There is at least a minority that believes the term “subjective probability” isn’t meaningful.
        timtyler 14 May 2010 8:33 UTC
        0 points
        Parent
        I only scanned that—and I don’t immediately see the relationship to your comment—but it seems as though it would be a large digression of dubious relevance.
        Jack 14 May 2010 5:24 UTC
        0 points
        Parent
        Or whether or not it is meaningful, it is certainly fraught with all the associated confusion of personal identity, the arrow of time and information. I don’t think anyone can claim to understand it well enough to assert that those of us who see the Sleeping Beauty problem entailing a different payoff scheme are obviously and demonstrably wrong. We know how to answer related decision problems but no one here has established the right or the best way to assign the payoff scheme to credence. And people seem too frustrated by the fact that anyone could disagree with them to actually consider the pros and cons of using other payoff schemes.
      - PhilGoetz 14 May 2010 4:32 UTC
        0 points
        Parent
        Does “the community” mean some scientific community outside of LessWrong? Because LW seems split on the issue.
        timtyler 14 May 2010 8:35 UTC
        0 points
        Parent
        Well, yes, sure. “That’s just peanuts to space”.
      - garethrees 13 May 2010 22:24 UTC
        0 points
        Parent
        That’s interesting. But then you have to either abandon Bayes’ Law, or else adopt very bizarre interpretations of p(D|H), p(H) and p(D) in order to make it come out. Both of these seem like very heavy prices to pay. I’d rather admit that my intuition was wrong.
        
        Is the motivating intuition beyond your comment, the idea that your subjective probability should be the same as the odds you’d take in a (fair) bet?
        timtyler 13 May 2010 22:45 UTC
        0 points
        Parent
        Subjective probabilities are traditionally analyzed in terms of betting behavior. Bets that are used for elucidating subjective probabilities are constructed using “scoring rules”. It’s a standard way of revealing such probabilities.
        
        I am not sure what you mean by “abandoning Bayes’ Law”, or using “bizarre” interpretations of probability. In this case, the relevant data includes the design of the experiment—and that is not trivial to update on, so there is scope for making mistakes. Before questioning the integrity of your tools, is it possible that a mistake was made during their application?
        garethrees 13 May 2010 22:59 UTC
        0 points
        Parent
        Bayes’ Law says, p(H|D) = p(D|H) p(H) / p(D) where H is the hypothesis of interest and D is the observed data. In the Sleeping Beauty problem H is “the coin lands heads” and D is “Sleeping Beauty is awake”. p(H) = ½, and p(D|H) = p(D) = 1. So if your intuition tells you that p(H|D) = ⅓, then you have to either abandon Bayes’ Law, or else change one or more of the values of p(D|H), p(H) and p(D) in order to make it come out.
        
        (We can come back to the intuition about bets once we’ve dealt with this point.)
        Morendil 13 May 2010 23:11 UTC
        2 points
        Parent
        Hold on—p(D|H) and P(D) are not point values but probability distributions, since there is yet another variable, namely what day it is.
        Cyan 14 May 2010 4:14 UTC
        0 points
        Parent
        The other variable has already been marginalized out.
        timtyler 14 May 2010 8:49 UTC
        0 points
        Parent
        So long as it is not Saturday. And the ideas that p(H) = ½ comes from Saturday.
        Morendil 14 May 2010 8:36 UTC
        0 points
        Parent
        But marginalizing over the day doesn’t work out to P(D)=1 since on some days Beauty is left asleep, depending on how the coin comes up.
        
        Here is (for a three-day variant) the full joint probability distribution, showing values which are in accordance with Bayes’ Law but where P(D) and P(D|H) are not the above. We can’t “change the values” willy-nilly, they fall out of formalizing the problem.
        
        Frustratingly, I can’t seem to get people to take much interest in that table, even though it seems to solve the freaking problem. It’s possible that I’ve made a mistake somewhere, in which case I’d love to see it pointed out.
        Cyan 14 May 2010 18:14 UTC
        0 points
        Parent
        I was just talking about the notation “p(D|H)” (and “p(D)”), given that D has been defined as the observed data. Then any extra variables have to have been marginalized out, or the expression would be p(D, day | H). I didn`t mean to assert anything about the correctness of the particular number ascribed to p(D|H).
        
        I did look at the table, but I missed the other sheets, so I didn`t understand what you were arguing.
        timtyler 14 May 2010 8:58 UTC
        0 points
        Parent
        It seems to say that p(heads|woken) = 0.25. A whole new answer :-(
        Expand this thread
        Morendil 14 May 2010 9:03 UTC
        2 points
        Parent
        That’s in the three-day variant; it also has a sheet with the original.
        timtyler 14 May 2010 9:24 UTC
        0 points
        Parent
        It has three sheets. The respective conclusions are: p(heads|woken) = 0.25, p(heads|woken) = 0.33 and p(heads|woken) = 0.50. One wonders what you are trying to say.
        Morendil 14 May 2010 9:56 UTC
        0 points
        Parent
        That ¹⁄₃ is correct in the original, that ¹⁄₂ comes from allocating zero probability mass to “not woken up”, and the three-day version shows why that is wrong.
        timtyler 14 May 2010 8:47 UTC
        0 points
        Parent
        I don’t see how that analysis is useful. Beauty is awake at the start and the end of the experiment, and she updates accordingly, depending on whether she believes she is “inside” the experiment or not. So, having D mean: “Sleeping Beauty is awake” does not seem very useful. Beauty’s “data” should also include her knowledge of the experimental setup, her knowledge of the identity of the subject, and whether she is facing an interviewer with amnesia. These things vary over time—and so they can’t usefully be treated as a single probability.
        
        You should be careful if plugging values into Bayes’ theorem in an attempt to solve this problem. It contains an amnesia-inducing drug. When Beauty updates, you had better make sure to un-update her again afterwards in the correct manner.
        garethrees 14 May 2010 10:42 UTC
        1 point
        Parent
        D is the observation that Sleeping Beauty makes in the problem, something like “I’m awake, it’s during the experiment, I don’t know what day it is, and I can’t remember being awoken before”. p(D) is the prior probability of making this observation during the experiment. p(D|H) is the likelihood of making this observation if the coin lands heads.
        
        As I said, if your intuition tells you that p(H|D) = ⅓, then something else has to change to make the calculation work. Either you abandon or modify Bayes’ Law (in this case, at least) or you need to disagree with me on one or more of p(D), p(D|H), and p(H).
        timtyler 14 May 2010 21:23 UTC
        1 point
        Parent
        As I said, be careful about using Bayes’ theorem in the case where the agent’s mind is being meddled with by amnesia-inducing drugs. If Beauty had not had her mind addled by drugs, your formula would work—and p(H|D) would be equal to ¹⁄₂ on her first awakening. As it is, Beauty has lost some information that pertains to the answer she gives to the problem—namely the knowledge of whether she has been woken up before already—or not. Her uncertainty about this matter is the cause of the problem with plugging numbers into Bayes’ theorem.
        
        The theorem models her update on new information—but does not model the drug-induced deletion from her mind of information that pertains to the answer she gives to the problem.
        
        If she knew it was Monday, p(H|D) would be about ¹⁄₂. If she knew it was Tuesday, p(H|D) would be about 0. Since she is uncertain, the value lies between these extremes.
        
        Is over-reliance on Bayes’ theorem—without considering its failure to model the problem’s drug-induced amnesia—a cause of people thinking the answer to the problem is ¹⁄₂, I wonder?
        garethrees 15 May 2010 9:18 UTC
        1 point
        Parent
        If I understand rightly, you’re happy with my values for p(H), p(D) and p(D|H), but you’re not happy with the result. So you’re claiming that a Bayesian reasoner has to abandon Bayes’ Law in order to get the right answer to this problem. (Which is what I pointed out above.)
        
        Is your argument the same as the one made by Bradley Monton? In his paper Sleeping Beauty and the forgetful Bayesian, Monton argues convincingly that a Bayesian reasoner needs to update upon forgetting, but he doesn’t give a rule explaining how to do it.
        
        Naively, I can imagine doing this by putting the reasoner back in the situation before they learned the information they forgot, and then updating forwards again, but omitting the forgotten information. (Monton gives an example on pp. 51–52 where this works.) But I can’t see how to make this work in the Sleeping Beauty case: how do I put Sleeping Beauty back in the state before she learned what day it is?
        
        So I think the onus remains with you to explain the rules for Bayesian forgetting, and how they lead to the answer ⅓ in this case. (If you can do this convincingly, then we can explain the hardness of the Sleeping Beauty problem by pointing out how little-known the rules for Bayesian forgetting are.)
        Expand this thread
        timtyler 15 May 2010 10:05 UTC
        0 points
        Parent
        Well, there is not anything wrong with Bayes’ Law. It doesn’t model forgetting—but it doesn’t pretend to. I would not say you have to “abandon” Bayes’ Law to solve the problem. It is just that the problem includes a process (namely: forgetting) that Bayes’ Law makes no attempt to model in the first place. Bayes’ Law works just fine for elements of the problem involving updating based on evidence. What you have to do is not abuse Bayes’ Law—by using it in circumstances for which it was never intended and is not appropriate.
        
        Your opinion that I am under some kind of obligation to provide a lecture on the little-known topic of Bayesian forgetting has been duly noted. Fortunately, people don’t need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem—but it would certainly help if they avoid applying the Bayes update rule while completely ignoring the whole issue of the effect of drug-induced amnesia—much as Bradley Monton explains.
        garethrees 17 May 2010 18:42 UTC
        0 points
        Parent
        You’re not obliged to give a lecture. A reference would be ideal.
        
        Appealing to “forgetting” only gives an argument that our reasoning methods are incomplete: it doesn’t argue against ½ or in favour of ⅓. We need to see the rules and the calculation to decide if it settles the matter.
        timtyler 17 May 2010 18:54 UTC
        0 points
        Parent
        To reiterate, people do not need to know or understand the Bayesian rules of forgetting in order to successfully solve this problem. Nobody used this approach to solving the problem—as far as I am aware—but the vast majority obtained the correct answer nontheless. Correct reasoning is given on http://en.wikipedia.org/wiki/Sleeping_Beauty_problem—and in dozens of prior comments on the subject.
        garethrees 17 May 2010 19:22 UTC
        1 point
        Parent
        The Wikipedia page explains how a frequentist can get the answer ⅓, but it doesn’t explain how a Bayesian can get that answer. That’s what’s missing.
        
        I’m still hoping for a reference for “the Bayesian rules of forgetting”. If these rules exist, then we can check to see if they give the answer ⅓ in the Sleeping Beauty case. That would go a long way to convincing a naive Bayesian.
        timtyler 17 May 2010 22:07 UTC
        0 points
        Parent
        I do not think it is missing—since a Bayesian can ask themselves at what odds they would accept a bet on the coin coming up heads—just as easily as any other agent can.
        
        What is missing is an account involving Bayesian forgetting. It’s missing because that is a way of solving the problem which makes little practical sense.
        
        Now, it might be an interesting exercise to explore the rules of Bayesian forgetting—but I don’t think it can be claimed that that is needed to solve this problem—even from a Bayesian perspective. Bayesians have more tools available to them than just Bayes’ Law.
        
        FWIW, Bayesian forgetting looks somewhat managable. Bayes’ Law is a reversible calculation—so you can just un-apply it.
    - PhilGoetz 14 May 2010 4:52 UTC
      0 points
      Parent
      Okay—WRT “credence”, you have a good point; it’s a vague word. But, p(H|D) and “expected proportion of observations consistent with data D in which the hypothesis H was confirmed” give the same results. (Frequentists are allowed to use the p(H|D) notation, too.) There isn’t a difference between Bayesians and other reasoners; there’s a difference between what evidence one believes is being conditioned on. You’re correct that your actual claim isn’t addressed by comments in those posts; but your claim depends on beliefs that are argued for and against in the comments.
      
      The problem somehow persuades some people to imagine themselves as an instance of Sleeping Beauty selected uniformly from the three instances {(heads,Monday), (tails,Monday), (tails,Tuesday)}
      
      That’s the correct interpretation, where “correct” means “what the original author intended”. Under the alternate interpretation, you will find yourself wondering why the author wrote all this stuff about Sleeping Beauty falling asleep, and forgetting what happened before, because it has no effect on the answer. This proves that the author didn’t have that interpretation.
      
      The clearest explanation yet posted is actually included in the beginning of the Sleeping Beauty post.
    - PhilGoetz 13 May 2010 23:26 UTC
      0 points
      Parent
      
      it’s a problem that’s been selected for discussion because it’s hard, so it might be productive to try to understand why it’s hard.
      
      Agreed.
- Morendil 13 May 2010 20:21 UTC
  2 points
  Parent
  I’d be interested in your opinion on this where I’ve formalized the SB problem as a joint probability distribution, with as precise a mathematical justification as I could muster as described here.
  
  It seems that SB even generates confusion as to where the ambiguity comes from in the first place. :)
- Jonathan_Graehl 17 May 2010 23:29 UTC
  0 points
  Parent
  I believe I’ve proven that the thirders are objectively right (and everyone else wrong).