The Sleeping Beauty problem and transformation invariances
I recently read this blog post by Allen Downey in response to a reddit post in response to Julia Galef’s video about the Sleeping Beauty problem. Downey’s resolution boils down to a conjecture that optimal bets on lotteries should be based on one’s expected state of prior information just before the bet’s resolution, as opposed to one’s state of prior information at the time the bet is made.
I suspect that these two distributions are always identical. In fact, I think I remember reading in one of Jaynes’ papers about a requirement that any prior be invariant under the acquisition of new information. That is to say, the prior should be the weighted average of possible posteriors, where the weights are the likelihood that each posterior would be acheived after some measurement. But now I can’t find this reference anywhere, and I’m starting to doubt that I understood it correctly when I read it.
So I have two questions:
1) Is there such a thing as this invariance requirement? Does anyone have a reference? It seems intuitive that the prior should be equivalent to the weighted average of posteriors, since it must contain all of our prior knowledge about a system. What is this property actually called?
2) If it exists, is it a corollary that our prior distribution must remain unchanged unless we acquire new information?
Sleeping Beauty is right not to cancel the bet if she is woken up on not-Wednesday (so Monday or Tuesday, but she does not know which). But this is not the optimal strategy. instead she should she roll a 4-sided die and cancel the bet if she rolls a 1. In other words she should keep the bet with 75% chance.
If she always keeps the bet, her expected payout on Wednesday is 0.5 1.5 − 0.5 1 = 0.25. That is she makes 25 cents per dollar bet, on average. But if she cancels the bet with 25% chance each time she wakes up, her expected win is 0.5 0.75 1.5 − 0.5 0.75 0.75 * 1 = 0.28125. So now she makes 28.125 cents for every dollar bet.
I think this illuminates the apparent paradox too. The fact that she is woken up on a not-Wednesday is extra information. She should lower her confidence in winning the bet, knowing the fact that she has been woken up. And she can use this information to her benefit.
If she always cancels the bet though, she destroys this extra information, since the bet will end up cancelled regardless of whether the coin came up heads or tails.
1) Yes, the prior is the weighted average of posteriors. This is just the decomposition of P(A) into the sum over b of P(A|b)P(b). The rules applied to do this are the product rule and the mutual exclusivity and exhaustiveness of the different b.
Eliezer has a post on this called “conservation of expected evidence.”
2) True, though in anthropic problems this requires more than usual caution, because of the commonness of non-barking dogs (that is, places where you gain information even though no flashing signs pop up to make sure everyone knows you gained information).
In fact, I wrote the above sentence before looking at the blog post. And lo and behold, it’s relevant! Allen Downey says:
This is not so! But the information gained is what we sometimes call ‘indexical’ information—information about where, when, or who you are. When you wake up, the thing you learn is that you are now inside the experiment. That seems like a pretty important new thing to know.
I really like Downey’s train analogy. The trick, and the way to get ordinary Bayesian reasoning to work here, is to make sure to give different events their own probability—only when you treat the two local trains as two separate events (one way to do this is by setting aside two different labels for them), do you get the right answer. If you just say that P(express train)=1 and P(local train)=1 and stop there, you fail to capture some of your knowledge about the world. You have to say something like P(EXPR)=1, P(LOC1)=1, P(LOC2)=1, P(local|LOC1)=1, P(local|LOC2)=1 - you have to tell the math that being a local train is a property held by two different actual trains.
As for the claim about betting, let alone calling it a Fundamental Theorem, the entire point of the Sleeping Beauty problem is that the bet pays out to a different number of people than actually made the bet before the experiment. Depending on how this is expected to play out, different betting strategies can be right. If all actual transactions only occur at payoff time, though, it seems correct to only consider the situation then.
Exactly! To quote Bostrom
Incidentally, I had a question on that paper, and now seems as good a time as any to bring it up. To quote the second-to-last paragraph (this will make no sense unless you’ve read it)
I didn’t really get how this would work. If she doesn’t lose anything on the second bet, then that’s effectively not a bet. How can losing nothing be part of her expected loss calculations?
Isn’t this true by definition? “Prior” means “prior to the acquisition of new information.”
From the blog post:
The confusion is in dropping context. The context is of sampling the results of a coin flip, and in particular biased sampling based on the result of the coin flip.
Watching the flip on Sunday is a different sampling process than the Sleeping Beauty sampling process. That a biased sampling of a coin flip produces a bias in the observed outcomes should not be a shock to people.
Lets parameterize the Sleeping Beauty process by the number of awakenings in either path. SB(Hnum,Tnum). The standard process is SB(1,2). Let’s consider the process SB(0,1) - on a flip of Heads, Sleeping Beauty is shot in the head and never wakes up, and on a flip of tails, Sleeping Beauty is woken up once.
P(Tails| awakening, SB(0,1)) = 1. Yes? Anyone not see that? This is not a cheat coin, this is Sleeping Beauty knowing she’ll only awaken on a flip of Tails. Biased sampling. Just not complicated. Different process than the coin flip itself.
Similarly, P(Tails| awakening, SB(Hnum,Tnum)) = Tnum/(Hnum+Tnum).. More biased sampling.
Note that the blogger did not include an identification of the sampling process in the prior information he conditioned on in his equations, leaving him free to confuse the two different sampling processes he was thinking of.
But, just plug and chug the Jaynes way, conditioning on all your prior information, and voila! The result is transparent.
Jaynes wins again!
I can’t see how this is a useful requirement in practice. Consider a trivial example of a misunderstanding.
As a more general observation, new information can change your set of possible outcomes.