Radford Neal comments on The Beauty and the Prince

Radford Neal 27 Jun 2018 0:22 UTC
10 points
But nothing in the specification of the Sleeping Beauty problem justifies treating it that way. Beauty is a ordinary human being who happens to have forgotten some things. If Beauty makes two betting decisions at different times, they are separate decisions, which are not necessarily the same—though it’s likely they will be the same if Beauty has no rational basis for making different decisions at those two times. There is a standard way of using probabilities to make decisions, which produces the decisions that everyone seems to agree are correct only if Beauty assigns probability ¹⁄₃ to the coin landing Heads.
You could say that you’re not going to use standard decision theory, and therefore are going to assign a different probability to Heads, but that’s just playing with words—like saying the sky is green, not blue, because you personally have a different scheme of colour names from everyone else.
- Chris_Leong 27 Jun 2018 0:48 UTC
  7 points
  Parent
  “But nothing in the specification of the Sleeping Beauty problem justifies treating it that way”—apart from the fact that you’re being asked twice.
  “There is a standard way of using probabilities to make decisions, which produces the decisions that everyone seems to agree are correct only if Beauty assigns probability ¹⁄₃ to the coin landing Heads” − ¹⁄₂ gives the correct decisions as well, you just need a trivial modification to your decision theory.
  - Radford Neal 27 Jun 2018 1:03 UTC
    9 points
    Parent
    So in every situation in which someone asks you the same question twice, standard probability and decision theory doesn’t apply? Seems rather sweeping to me. Or it’s only a problem if you don’t happen to remember that they asked that question before? Still seems like it would rule out numerous real-life situations where in fact nobody thinks there is any problem whatever in using standard probability and decision theory.
    There is one standard form of probabability theory and one standard form of decision theory. If you need a “trivial modification” of your decision theory to justify assigning a probability of ¹⁄₂ rather than ¹⁄₃ to some event, then you are not using standard probability and decision theory. I need only a “trivial modification” of the standard mapping from colour names to wavelengths to justify saying the sky is green.
    - Chris_Leong 27 Jun 2018 2:12 UTC
      5 points
      Parent
      Ok, I could have been clearer, simply being asked it twice isn’t enough, it’s that we need to be also scoring it twice. Further, if we are always asked the same question N times in each possible state and they are all included in the “score”, it’ll all just cancel out.
      The memory wipe is only relevant in so far as it actually allows asking the same question twice; otherwise you can deduce that it is tails when you’re interviewed the second time.
      - Radford Neal 27 Jun 2018 3:29 UTC
        2 points
        Parent
        What do you mean by “scoring it twice”? You seem to have some sort of betting/payoff scheme in mind, but you haven’t said what it is. I suspect that as soon as you specify some scheme, it will be clear that assigning probability ¹⁄₃ to Heads gives the right decision when you apply standard decision theory, and that you don’t get the right decision if you assign probability ¹⁄₂ to Heads and use standard decision theory.
        And remember, Beauty is a normal human being. When a human being makes a decision, they are just making one decision. They are not simultaneously making that decision for all situations that they will ever find themselves in where the rational decision to make happens to be the same (even if the rationale for making that decision is also the same). That is not the way standard decision theory works. It is not the way normal human thought works.
        Chris_Leong 27 Jun 2018 5:25 UTC
        5 points
        Parent
        “I suspect that as soon as you specify some scheme, it will be clear that assigning probability ¹⁄₃ to Heads gives the right decision when you apply standard decision theory, and that you don’t get the right decision if you assign probability ¹⁄₂ to Heads and use standard decision theory.”—I’ve already explained that you make a slight modification to account for the number of times you are asked. Obviously, if you don’t make this modification you’ll get the incorrect betting odds.
        But I’m not looking to set things on a solid foundation in this post, that will have to wait to the future. The purpose of this post is just to demonstrate how a halver should analyse The Beauty and the Prince given foundations.
        Dagon 27 Jun 2018 16:15 UTC
        2 points
        Parent
        Unstated in the problem (which is the main point of confusion IMO) is what decision Beauty is making, and what are the payouts/costs if she’s right or wrong. What difference does it make to her if she says 1/pi as the probability, because that’s prettier than ¹⁄₂ or 1/3?
        If the payout is +1 utility for being correct and −1 for being incorrect, calculated each time the question’s asked, then ¹⁄₃ is the correct answer, because she’ll lose twice if wrong, but only win once if right. If the payout is calculated only once on Wednesday, with a 0 payout if she manages to answer differently Monday and Tuesday, then ¹⁄₂ is the right answer.
        They are not simultaneously making that decision for all situations that they will ever find themselves in where the rational decision to make happens to be the same (even if the rationale for making that decision is also the same).
        Aren’t they? If there is zero detectable change in cognition or evidence between the two decisions, how could the second one be different?
        Radford Neal 27 Jun 2018 18:30 UTC
        2 points
        Parent
        No, you get the wrong answer in your second scenario (with −1, 0, or +1 payoff) if you assign a probability of ¹⁄₂ to Heads, and you get the right answer if you assign a probability of ¹⁄₃.
        In this scenario, guessing right is always better than guessing wrong. Being right rather than wrong either (A) gives a payoff of +1 rather than −1, if you guess only once, or (B) gives a payoff of +1 rather than 0, if you guess correctly another day, or (C) gives a payoff of 0 rather than −1, if you guess incorrectly another day. Since the change in payoff for (B) and (C) are the same, one can summarize this by saying that the advantage of guessing right is +2 if you guess only once (ie, the coin landed Heads), and +1 if you guess twice (ie, the coin landed Tails).
        A Halfer will compute the difference in payoff from guessing Heads rather than Tails as (1/2)*(+2) + (1/2)*(-1) = ¹⁄₂, and so they will guess Heads (both days, presumably, if the coin lands Tails). A Thirder will compute the difference in payoff from guessing Heads rather than Tails as (1/3)*(+2) + (2/3)*(-1) = 0, so they will be indifferent between guessing Heads or Tails. If we change the problem slightly so that there is a small cost (say ¹⁄₁₀₀) to guessing Heads (regardless of whether this guess is right or wrong), then a Halfer will still prefer Heads, but a Thirder will now definitely prefer Tails.
        What will actually happen without the small penalty is that both the Halfer and the Thirder will get an average payoff of zero, which is what the Thirder expects, but not what the Halfer expects. If we include the ¹⁄₁₀₀ penalty for guessing Heads, the Halfer has an expected payoff of −1/100, while the Thirder still has an expected payoff of zero, so the Thirder does better.
        ---
        If today you choose chocolate over vanilla ice cream today, and yesterday you did the same, and you’re pretty sure that you will always choose chocolate over vanilla, is your decision today really a decision not for one ice cream cone but for thousands of cones? Not by any normal idea of what it means to “decide”.
        Dagon 27 Jun 2018 21:11 UTC
        3 points
        Parent
        No, you get the wrong answer in your second scenario (with −1, 0, or +1 payoff) if you assign a probability of ¹⁄₂ to Heads, and you get the right answer if you assign a probability of ¹⁄₃.
        Huh? Maybe I wasn’t clear in my second scenario. This is the situation where the bet is resolved only once, on Wednesday, with payouts being: a) +1 if it was heads, and she said “heads” on Tuesday (not being woken up on Monday); b) −1 if it was heads but she said “tails” on tuesday; c) +1 if it was tails and she said “tails” on both monday and tuesday; d) −1 if it was tails and she said heads on both monday and tuesday; e) (for completeness, I don’t believe it’s possible) 0 if it was tails and she gave different answers on monday and tuesday.
        ¹⁄₂ is the right answer here—it’s a literal coinflip and we’ve removed the double-counting of the mindwipe.
        Radford Neal 27 Jun 2018 22:34 UTC
        2 points
        Parent
        You’ve swapped Monday and Tuesday compared to the usual description of the problem, but other than that, your description is what I am working with. You just have a mistaken intuition regarding how the probabilities relate to decisions—it’s slightly non-obvious (but maybe not obvious that it’s non-obvious). Note that this is all using completely standard probability and decision theory—I’m not doing anything strange here.
        In this situation, as explained in detail in my reply above, Beauty gets the right answer regarding how to bet only if she gives probability ¹⁄₃ to Heads whenever she is woken, in which case she is indifferent to guessing Heads versus Tails (as she should be—as you say, it’s just a coin flip), whereas if she gives probability ¹⁄₂ to Heads, she will have a definite preference for guessing Heads. If we give guessing Heads a small penalty (say on Monday only, to resolve how this works if her guesses differ on the two days), in order to tip the scales away from indifference, the Thirder Beauty correctly guesses Tails, which does indeed maximizes her expected reward, whereas the Halfer Beauty does the wrong thing by still guessing Heads.