Fundamentals of kicking anthropic butt
Introduction
An anthropic problem is one where the very fact of your existence tells you something. “I woke up this morning, therefore the earth did not get eaten by Galactus while I slumbered.” Applying your existence to certainties like that is simple—if an event would have stopped you from existing, your existence tells you that that it hasn’t happened. If something would only kill you 99% of the time, though, you have to use probability instead of deductive logic. Usually, it’s pretty clear what to do. You simply apply Bayes’ rule: the probability of the world getting eaten by Galactus last night is equal to the prior probability of Galactus-consumption, times the probability of me waking up given that the world got eaten by Galactus, divided by the probability that I wake up at all. More exotic situations also show up under the umbrella of “anthropics,” such as getting duplicated or forgetting which person you are. Even if you’ve been duplicated, you can still assign probabilities. If there are a hundred copies of you in a hundred-room hotel and you don’t know which one you are, don’t bet too much that you’re in room number 68.
But this last sort of problem is harder, since it’s not just a straightforward application of Bayes’ rule. You have to determine the probability just from the information in the problem. Thinking in terms of information and symmetries is a useful problem-solving tool for getting probabilities in anthropic problems, which are simple enough to use it and confusing enough to need it. So first we’ll cover what I mean by thinking in terms of information, and then we’ll use this to solve a confusing-type anthropic problem.
Parable of the coin
Eliezer has already written about what probability is in Probability is in the Mind. I will revisit it anyhow, using a similar example from Probability Theory: The Logic of Science.
It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there’s a 0.5 probability of heads and a 0.5 probability of tails. You draw the coin forth, flip it, and slap it down. What is the probability that when you take your hand away, you see heads?
Well, you performed a fair coin flip, so the chance of heads is 0.5. What’s the problem? Well, imagine the coin’s perspective. When you say “heads, 0.5,” that doesn’t mean the coin has half of heads up and half of tails up: the coin is already how it’s going to be, sitting pressed under your hand. And it’s already how it is with probability 1, not 0.5. If the coin is already tails, how can you be correct when you say that it’s heads with probability 0.5? If something is already determined, how can it still have the property of randomness?
The key idea is that the randomness isn’t in the coin, it’s in your map of the coin. The coin can be tails all it dang likes, but if you don’t know that, you shouldn’t be expected to take it into account. The probability isn’t a physical property of the coin, nor is it a property of flipping the coin—after all, your probability was still 0.5 when the truth was sitting right there under your hand. The probability is determined by the information you have about flipping the coin.
Assigning probabilities to things tells you about the map, not the territory. It’s like a machine that eats information and spits out probabilities, with those probabilities uniquely determined by the information that went in. Thinking about problems in terms of information, then, is about treating probabilities as the best possible answers for people with incomplete information. Probability isn’t in the coin, so don’t even bother thinking about the coin too much—think about the person and what they know.
When trying to get probabilities from information, you’re going to end up using symmetry a lot. Because information uniquely specifies probability, if you have identical information about two things, then you should assign them equal probability. For example, if someone switched the labels “heads” and “tails” in a fair coin flip, you couldn’t tell that it had been done—you never had any different information about heads as opposed to tails. This symmetry means you should give heads and tails equal probability. Because heads and tails are mutually exclusive (they don’t overlap) and exhaustive (there can’t be anything else), the probabilities have to add to 1 (which is all the probability there is), so you give each of them probability 0.5.
Brief note on useless information
Real-world problems, even when they have symmetry, often start you off with a lot more information than “it could be heads or tails.” If we’re flipping a real-world coin there’s the temperature to consider, and the humidity, and the time of day, and the flipper’s gender, and that sort of thing. If you’re an ordinary human, you are allowed to call this stuff extraneous junk. Sometimes, this extra information could theoretically be correlated with the outcome—maybe the humidity really matters somehow, or the time of day. But if you don’t know how it’s correlated, you have at least a de facto symmetry. Throwing away useless information is a key step in doing anything useful.
Sleeping Beauty
So thinking with information means assigning probabilities based on what people know, rather than treating probabilities as properties of objects. To actually apply this, we’ll use as our example the sleeping beauty problem:
- Suppose Sleeping Beauty volunteers to undergo the following experiment, which is described to her before it begins. On Sunday she is given a drug that sends her to sleep, and a coin is tossed. If the coin lands heads, Beauty is awakened and interviewed on Monday, and then the experiment ends. If the coin comes up tails, she is awakened and interviewed on Monday, given a second dose of the sleeping drug that makes her forget the events of Monday only, and awakened and interviewed again on Tuesday. The experiment then ends on Tuesday, without flipping the coin again.
- Beauty wakes up in the experiment and is asked, “With what subjective probability do you believe that the coin landed tails?”
If the coin lands heads, Sleeping Beauty is only asked for her guess once, while if the coin lands tails she is asked for her guess twice, but her memory is erased in between so she has the same memories each time.
When trying to answer for Sleeping Beauty, many people reason as follows: It is a truth universally acknowledged that when someone tosses a fair coin without cheating, there’s a 0.5 probability of heads and a 0.5 probability of tails. So since the probability of tails is 0.5, Beauty should say “0.5,” Q.E.D. Readers may notice that this argument is all about the coin, not about what Beauty knows. This violation of good practice may help explain why it is dead wrong.
Thinking with information: some warmups
To collect the ingredients of the solution, I’m going to first go through some similar-looking problems.
In the Sleeping Beauty problem, she has to choose between three options—let’s call them {H, Monday}, {T, Monday}, and {T, Tuesday}. So let’s start with a very simple problem involving three options: the three-sided die. Just like for the fair coin, you know that the sides of the die are mutually exclusive and exhaustive, and you don’t know anything else that would be correlated with one side showing up more than another. Sure, the sides have different labels, but the labels are extraneous junk as far as probability is concerned. Mutually exclusive and exhaustive means the probabilities have to add up to one, and the symmetry of your information about the sides means you should give them the same probabilities, so they each get probability 1⁄3.
Next, what should Sleeping Beauty believe before the experiment begins? Beforehand, her information looks like this: she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails.
This way of stating her information is good enough most of the time, but what’s going on is clearer if we’re a little more formal. There are three exhaustive (but not mutually exclusive) options: {H, Monday}, {T, Monday}, and {T, Tuesday}. She knows that anything with heads is mutually exclusive with anything with tails, and that {T, Tuesday} happens if and only if {T, Monday} happened.One good way to think of this last piece of information is as a special “AND” structure containing {T, Monday} and {T, Tuesday}, like in the picture to the right. What it means is that since the things that are “AND” happen together, the other probabilities won’t change if we merge them into a single option, which I shall call {T, Both}. Now we have two options, {H, Monday} and {T, Both}, which are both exhaustive and mutually exclusive. This looks an awful lot like the fair coin, with probabilities of 0.5.
But can we leave it at that? Why shouldn’t two days be worth twice as much probability as one day, for instance? Well, it turns out we can leave at that, because we have now run out of information from the original problem. We used that there were three options, we used that they were exhaustive, we used that two of them always happened together, and we used that the remaining two were mutually exclusive. That’s all, and so that’s where we should leave it—any more and we’d be making up information not in the problem, which is bad.
So to decompress, before the experiment begins Beauty assigns probability 0.5 to the coin landing heads and being woken up on Monday, probability 0.5 to the coin landing tails and being woken up on Monday, and probability 0.5 to the coin landing tails and being woken up on Tuesday. This adds up to 1.5, but that’s okay since these things aren’t all mutually exclusive.
Okay, now for one last warmup. Suppose you have two coins. You flip the first one, and if it lands heads, you place the second coin on the table heads up. If the first coin lands tails, though, you flip the second coin.This new problem looks sort of familiar. You have three options, {H, H}, {T, H} and {T, T}, and these options are mutually exclusive and exhaustive. So does that mean it’s the same set of information as the three-sided die? Not quite. Similar to the “AND” previously, my drawing for this problem has an “OR” between {T, H} and {T,T}, representing additional information.
I’d like to add a note here about my jargon. “AND” makes total sense. One thing happens and another thing happens. “OR,” however, doesn’t make so much sense, because things that are mutually exclusive are already “or” by default—one thing happens or another thing happens. What it really means is that {H, H} has a symmetry with the sum of {T, H} and {T, T} (that is, {T, H} “OR” {T, T}). The “OR” can also be thought of as information about {H, H} instead—it contains what could have been both the {H, H} and {H, T} events, so there’s a four-way symmetry in the problem, it’s just been relabeled.
When we had the “AND” structure, we merged the two options together to get {tails, both}. For “OR,” we can do a slightly different operation and replace {T, H} “OR” {T, T} by their sum, {T, either}. Now the options become {H, H} and {T, either}, which are mutually exclusive and exhaustive, which gets us back to the fair coin. Then, because {T, H} and {T, T} have a symmetry between them, you split the probability from {T, either} evenly to get probabilities of 0.5, 0.25, and 0.25.
Okay, for real now
Okay, so now what do things look like once the experiment has started? In English, now she knows that she signed up for this experiment where you get woken up on Monday if the coin lands heads and on Monday and Tuesday if it lands tails, went to sleep, and now she’s been woken up.
This might not seem that different from before, but the “anthropic information” that Beauty is currently one of the people in the experiment changes the formal picture a lot. Before, the three options were not mutually exclusive, because she was thinking about the future. But now {H, Monday}, {T, Monday}, and {T, Tuesday} are both exhaustive and mutually exclusive, because only one can be the case in the present. From the coin flip, she still knows that anything with heads is mutually exclusive with anything with tails. But once two things are mutually exclusive you can’t make them any more mutually exclusive.
But the “AND” information! What happens to that? Well, that was based on things always happening together, and we just got information that those things are mutually exclusive, so there’s no more “AND.” It’s possible to slip up here and reason that since there used to be some structure there, and now they’re mutually exclusive, it’s one or the other, therefore there must be “OR” information. At least the confusion in my terminology reflects an easy confusion to have, but this “OR” relationship isn’t the same as mutual exclusivity. It’s a specific piece of information that wasn’t in the problem before the experiment, and wasn’t part of the anthropic information (that was just mutual exclusivity). So Monday and Tuesday are “or” (mutually exclusive), but not “OR” (can be added up to use another symmetry).
And so this anthropic requirement of mutual exclusivity turns out to make redundant or render null a big chunk of the previous information, which is strange. You end up left with three mutually exclusive, exhaustive options, with no particular asymmetry. This is the three-sided die information, and so each of {H, Monday}, {T, Monday}, and {T, Tuesday} should get probability 1⁄3. So when asked for P(tails), Beauty should answer 2⁄3.
“SSA” and “SIA”
When assigning prior probabilities in anthropic problems, there are two main “easy” ways to assign probabilities, and these methods go by the acronyms “SSA” and “SIA.” “SSA” is stated like this1:
All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future) in their reference class.
For example, if you wanted the prior probability that you lived in Sweden, you might say ask “what proportion of human beings have lived in Sweden?”
On the other hand, “SIA” looks like this2:
All other things equal, an observer should reason as if they are randomly selected from the set of all possible observers.
Now the question becomes “what proportion of possible observers live in Sweden?” and suddenly it seems awfully improbable that anyone could live in Sweden.
The astute reader will notice that these two “assumptions” correspond to two different sets of starting information. If you want a quick exercise, figure out what those two sets of information are now. I’ll wait for you in the next paragraph.
Hi again. The information assumed for SSA is pretty straightforward. You are supposed to reason as if you know that you’re an actually existent observer, in some “reference class.” So an example set of information would be “I exist/existed/will exist and am a human.” Compared to that, SIA seems to barely assume any information at all—all you get to start with is “I am a possible observer.” Because “existent observers in a reference class” are a subset of possible observers, you can transform SIA into SSA by adding on more information, e.g. “I exist and am a human.” And then if you want to represent a more complicated problem, you have to add extra information on top of that, like “I live in 2012″ or “I have two X chromosomes.”
Trouble only sneaks in if you start to see these acronyms as mysterious probability generators rather than sets of starting information to build on. So don’t do that.
Closing remarks
When faced with straightforward problems, you usually don’t need to use this knowledge of where probability comes from. It’s just rigorous and interesting, like knowing how to do integration as a Riemann sum. But whenever you run into foundational or even particularly confusing problems, it’s good to remember that probability is about making the best use you can of incomplete information. If not, you run the risk of a few silly failure modes, or even (gasp) frequentism.
I recently read an academic paper3 that used the idea that in a multiverse, there will be some universe where a thrown coin comes up heads every time, and so the people in that universe will have very strange ideas about how coins work. Therefore, this actual academic paper argued, since reasoning with probability can lead people to be wrong, it cannot be applied to anything like a multiverse.
My response is: what have you got that works better? In this post we worked through assigning probabilities by using all of our information. If you deviate from that, you’re either throwing information away or making it up. Incomplete information lets you down sometimes, that’s why it’s called incomplete. But that doesn’t license you to throw away information or make it up, out of some sort of dissatisfaction with reality. The truth is out there. But the probabilities are in here.
- 15 Apr 2012 22:59 UTC; 4 points) 's comment on Our Phyg Is Not Exclusive Enough by (
- 11 Jan 2013 0:29 UTC; 3 points) 's comment on DRAFT:Ethical Zombies—A Post On Reality-Fluid by (
- 4 Jul 2012 2:10 UTC; -1 points) 's comment on Open Thread, July 1-15, 2012 by (
- 26 Mar 2012 21:52 UTC; -2 points) 's comment on Nonmindkilling open questions by (
Easy solution for the Sleeping Beauty problem: instead of merely asking her her subjective probability, we can ask her to bet. The question now becomes “at what odds would you be willing to bet?”. So here are the possibilities:
Heads. There will be one bet, Monday.
Tails. There will be two bets, Monday, and Tuesday.
Heads or tails comes up with equal probability (0.5). But when it comes up Tails, the stakes double (because she will bet twice). So, what will generate the correct bets is the assumption that Tails will subjectively come up 2⁄3 of the time.
I know it looks cheap, because it doesn’t answer the question “But what really is the subjective probability?”. I don’t know, but I’ll find a way to make the correct decision anyway.
By asking, “At what odds would you be willing to bet?”, you’ve skewed the payout matrix, not the probabilities—even subjectively. If she offers the bet at 2:1 odds, it’s so that when her future/past twin makes the same bet, it corrects the payout matrix. She adjusts in this way because the probability is 1⁄2.
It’s just like if an online bookie discovers a bug in their software so that when someone takes the first option in a bet, and if they win, they get paid twice. He needs to lower the payout on option 1 by a factor of 2 (on the backend, at least—no need to embarrass everyone by mentioning it on the front end).
Sleeping Beauty can consistently say, “The probability of my waking up this time having been a waking-event following a tails coinflip is 2⁄3. The probability of the coinflip having come up tails is 1⁄2. On either of these, if you demand to bet on the issue, I’m offering 2:1 odds.”
Consistently? Sorry, I can’t even parse the sentence that follows. Trying to understand it:
Could you mean “the fact that I just woke up from drugged induced sleep”? But this event is not correlated with the coin flip to begin with. (Whether it ends up head or tail, you will wake up seemingly for the first time.)
Whose probability?
Also, how my solution could lead Sleeping Beauty to be Dutch-booked? Could you provide an example, please?
My hunch is that any solution other than yours allows her to be Dutch-booked...
Not after correction for payout matrix, as described...
Hers, right then, as she says it.
Let’s go ahead and draw a clearer distinction.
SB is required, on sunday, to lay odds on the coin flip; the coin will be shown to her on wednesday, and the outcome judged. She is given the opportunity to change her mind about the odds she’s laying at any point during the experiment before it’s over. Should she change her odds? No.
About Dutch-booking—You must have gotten in there before I rewrote it, which I did before you finished posting. I realized I may have been misusing the term. Does the version up now make sense? Oh, heck, I’ll rewrite it again to make it even clearer.
Your new formulation is much better. Now I can identify the pain point.
I think I unconditionally agree with this one (I’m not certain, though).
This is when I get confused. See, if you ask the question before drugging SB, it feels obvious that she should answer “1/2”. As you say, she gains no information by merely waking up, because she knew she would in advance. Yet she should still bet 2:1 odds, whether it’s money or log-odds. In other words, how on Earth can the subjective probability be different from the correct betting odds?!
Currently, I see only two ways of solving this apparent contradiction. Either estimating 2:1 odds from the beginning, or admitting that waking up actually provided information. Both look crazy, and I can’t find any third alternative.
(Note that we assume she will be made to bet at each wake up no matter what. For instance, if she knows she only have to bet Monday, then she wakes up and is told to bet, she gains information that tells her “1/2 probability, 1⁄2 betting odds”. Same thing if she only know she will bet once.)
Because the number of bets she makes will be different in one outcome than the other. it’s exactly like the bookie software bug example I gave. Normally you don’t need to think about this, but when you begin manipulating the multiplicity of the bettors, you do.
Let’s take it to extremes to clarify what the real dependencies are. Instead of waking Bea 2 times, we wake her 1000 times in the event of a tails flip (I didn’t say 3^^^3 so we wouldn’t get boggled by logistics).
Now, how surprised should she be in the event of a heads flip? Astonished? Not that astonished? Equanimous? I’m going with Equanimous.
I don’t think using betting is cheap at all, if you want to answer questions about decision-making. But I still wanted to answer the question about probability :D
Link nitpick: When linking to arXiv, please link to the abstract, not directly to the PDF.
I’ve always thought of SSA and SIA as assumptions that depend on what your goal is in trying to figure out the probability. Sleeping Beauty may want to maximize the probability that she guesses the coin correctly at least once, in which cases she should use the probability 1⁄2. Or she may want to maximize the number of correct guesses, in which case she should use the probability 2⁄3.
In either case, asking “but what’s the probability, really?” isn’t helpful.
Edit: in the second situation, Sleeping Beauty should use the probability 2⁄3 to figure out how to maximize the number of correct guesses. This doesn’t mean she should guess T 2⁄3 of the time—her answer also depends on the payouts, and in the simplest case (she gets $1 for every correct guess) she should be guessing T 100% of the time.
Strongly agree. My paper here: http://arxiv.org/abs/1110.6437 takes the problem apart and considers the different components (utilities, probabilities, altruism towards other copies) that go into a decision, and shows you can reach the correct decision without worrying about the probabilities at all.
You’re wondering whether or not to donate to reduce existential risks. You won’t donate if you’re almost certain the world will end soon either way. You wake up as the 100 billionth person. Do you use this information to update on the probability that there will only be on the order of 100 billion people, and refrain from donating?
I really like your explanations in this thread.
However, I’ve always had the feeling that people raising “it just depends on the utility function / bet / payoff” were mostly trying to salve egos wounded by having wrongly analyzed the problem. It’s instructive to consider utility, but don’t pretend to be confused about whether Beauty should be surprised to learn that the toss was H and not T.
You’re right. For that reason, I think my explanations in the follow-up comments were better than this first attempt (not that this post is incorrect, it just doesn’t quite address the main point). I’ve previously tried to say the same thing here and here. The opinion I have hasn’t changed, but maybe my way of expressing it has.
Probabilities are unique. They’re a branch of math. They depend on your information, but your motivations are usually “extraneous junk information.” And math still works the same even if you ask that it is really (What’s 2+2, really? 4).
Now, you could invent something else for the letters “probability” to mean, and define that to be 1⁄2 in the sleeping beauty problem, that’s fine. But that wouldn’t be some “other probability.” That would be some other “probability.”
EDIT: It appears that I thoroughly misunderstood Misha to be saying two wrong things—first that probability can be defined by maximizing different things depending on what you want (not what was said), and second that asking “but what’s the probability really?” isn’t helpful because I’m totally wrong about probabilities being unique. So, whoops.
What I’m saying is that there are two probabilities there, and they are both the correct probabilities, but they are the correct probabilities of different things. These different things seem like answers to the same question because the English language isn’t meant to deal with Sleeping Beauty type problems. But there is a difference, which I’ve done my best to explain.
Given that, is there anything your nitpicking actually addresses?
By “two probabilities” you mean this? :
That looks like two “probabilities” to me. Could you explain what the probabilities would be of, using the usual Bayesian understanding of “probability”?
I can try to rephrase what I said, but I honestly have no clue what you mean by putting probabilities in quotes.
2⁄3 is the probability that this Sleeping Beauty is waking up in a world where the coin came up tails. 1⁄2 is the probability that some Sleeping Beauty will wake up in such a world. To the naive reader, both of these things sound like “The probability that the coin comes up tails”.
Ah, okay, that makes sense to me now. Thanks.
I put the word “probability” in quotes is because I wanted to talk about the word itself, not the type of logic it refers to. The reason I thought you were talking about different types of logic using the same word was because probability already specifies what you’re supposed to be maximizing. For individual probabilities it could be one of many scoring rules, but if you want to add scores together you need to use the log scoring rule.
Right. One of them is the probability that the coin comes up tails given some starting information (as in a conditional probability, like P(T | S)), and the other is the probability that the coin comes up tails, given the starting information and some anthropic information: P(T | S A). So they’re both “P(T),” in a way.
Hah, so I think in your original comment you meant “asking “but what’s P(T), really?” isn’t helpful,” but I heard “asking “but what’s P(T | S A), really?” isn’t helpful” (in my defense, some people have actually said this).
If this is right I’ll edit it into my original reply so that people can be less confused. Lastly, in light of this there is only one thing I can link to.
Can you add a summary break (one of the options when you edit the post) for the convenience of readers scrolling through lists of posts?
Aside:
It also gives you “I woke up this morning, therefore it is more likely that the earth was eaten by something a few orders of magnitude larger than Galactus*”. This kind of consideration is frivolous until you manage to find a way to use anthropics to solve np-complete problems (or otherwise encounter extreme anthropic circumstances).
* cf. Jonah.
The last time I had an anthropic principle discussion on Less Wrong I was pointed at the following paper: http://arxiv.org/abs/1110.6437 (See http://lesswrong.com/lw/9ma/selfindication_assumption_still_doomed/5sbv)
This struck me as interesting since it relates the Sleeping Beauty problem to a choice of utility function. Is Beauty a selfish utility maximizer with very high discount rate, or a selfish utility maximizer with low discount rate, or a total utility maximizer, or an average utility maximizer? The type of function affects what betting odds Beauty should accept.
Incidentally, one thing that is not usually spelled out in the story (but really should be) is whether there are other sentient people in the universe apart from Beauty, and how many of them there are. Also, does Beauty have any/many experiences outside the context of the coin-toss and awakening? These things make a difference to SSA (or to Bostrom’s SSSA).
While that work is interesting, knowing how to get probabilities means we can basically just ignore it :P Just assume Beauty is an ordinary utility-maximizer.
They make a difference if those things are considered as mysterious processes that output correct probabilities. But we already know how to get correct probabilities—you just follow the basic rules, or, in the equivalent formulation used in this post, follow the information. If SSA is used in any other way than as a set of starting information, it becomes an ad hoc method, not worth much consideration.
Not sure I follow that… what did you mean by an “ordinary” utility maximizer”? Is it a selfish or a selfless utility function, and if selfish what is the discount rate? The point about Armstrong’s paper is that really does matter.
Most of the utility functions do give the 2⁄3 answer, though for the “average utilitarian” this is only true if there are lots of people outside the Sleeping Beauty story (or if Beauty herself has lots of experiences outside the story).
I’m a bit wary about using an indifference principle to get “the one true answer”, because in the limit it suffers from the Presumptuous Philosopher problem. Imagine that Beauty (or Beauty clones) is woken a trillion times after a Tails toss. Then the indifference principle means that Beauty will be very near certain that the head fell Tails. Even if she is shown sworn affidavits and video recordings of the coin falling Heads, she’ll believe that they were faked.
So you have this utility function U, and it’s a function of different outcomes, which we can label by a bunch of different numbers “x”. And then you pick the option that maximizes the sum of U(x) * P(x | all your information).
There are two ways this can fail and need to be extended—either there’s an outcome you don’t have a utility for, or there’s an outcome you don’t have a probability for. Stuart’s paper is what you can do if you don’t have some probabilities. My post is how to get those probabilities.
If something is unintuitive, ask why it is unintuitive. Eventually either you’ll reach something wrong with the problem (does it neglect model uncertainty?), or you’ll reach something wrong with human intuitions (what is going on in peoples’ heads when they get the monty hall problem wrong?). In the meanwhile, I still think you should follow the math—unintuitiveness is a poor signal in situations that humans don’t usually find themselves in.
This looks like what Armstrong calls a “selfless” utility function i.e. it has no explicit term for Beauty’s welfare here/now or at any other point in time.. The important point here is that if Beauty bets tails, and the coin fell Tails, then there are two increments to U, whereas if the coin fell Heads then there is only one decrement to U. This leads to a 2⁄3 betting probability.
In the trillion Beauty case, the betting probability may depend on the shape of U and whether it is bounded (e.g. whether winning 1 trillion bets really is a trillion times better than winning one).
Stuart’s terms are a bit misleading because they’re about decision-making by counting utilities, which is not the same as decision-making by maximizing expected utility. His terms like “selfish” and “selfless” and so on are only names for counting rules for utilities, and have no direct counterpart in expected utility maximizers.
So U can contain terms like “I eat a candy bar. +1 utility.” Or it could only contain terms like “a sentient life-form eats a candy bar. +1 utility.” It doesn’t actually change what process Sleeping Beauty uses to make decisions in anthropic situations, because those ideas only applied to decision-making by counting utilities. Additionally, Sleeping Beauty makes identical decisions in anthropic and non-anthropic situations, if the utilities and the probabilities are the same.
OK, I think this is clearer. The main point is that whatever this “ordinary” U is scoring (and it could be more or less anything) then winning the tails bet scores +2 whereas losing the tails bet scores −1. This leads to 2⁄3 betting probability. If subjective probabilities are identical to betting probabilities (a common position for Bayesians) then the subjective probability of tails has to be 2⁄3.
The point about alternative utility functions though is that this property doesn’t always hold i.e. two Beauties winning doesn’t have to be twice as good as one Beauty winning. And that’s especially true for a trillion Beauties winning.
Finally, if you adopt a relative frequency interpretation (the coin-toss is repeated multiple times, and take limit to infinity) then there are obviously two relative frequencies of interest. Half the coins fall Tails, but two thirds of Beauty awakenings are after Tails. Either of these can be interpreted as a probability.
If we start with an expected utility maximizer, what does it do when deciding whether to take a bet on, say, a coin flip? Expected utility is the utility times the probability, so it checks whether P(heads) U(heads) > P(tails) U(tails). So betting can only tell you the probability if you know the utilities. And changing the utility function around is enough to get really interesting behavior, but it doesn’t mean you changed the probabilities.
What sort of questions, given what sorts of information, would give you these two probabilities? :D
For the first question: if I observe multiple coin-tosses and count what fraction of them are tails, then what should I expect that fraction to be? (Answer one half). Clearly “I” here is anyone other than Beauty herself, who never observes the coin-toss.
For the second question: if I interview Beauty on multiple days (as the story is repeated) and then ask her courtiers (who did see the toss) whether it was heads or tails, then what fraction of the time will they tell me tails? (Answer two thirds.)
What information is needed for this? None except what is defined in the original problem, though with the stipulation that the story is repeated often enough to get convergence.
Incidentally, these questions and answers aren’t framed as bets, though I could use them to decide whether to make side-bets.
I haven’t read the paper, but it seems like one could just invent payoff schemes customized for her utility function and give her arbitrary dilemmas that way, right?
There is the humanity of the observer to consider, but I don’t think that simply adding existence and humanity transforms SIA into SSA.
The example for the sleeping beauty problem shows this. Under SIA, she can reason about the bet by comparing herself to a set of 3 possible waking beauties. Under SSA this is impermissible because there is only a class of one or two existent waking beauties. Under SIA, she knows her existence and her humanity but this does not change the reasoning possible.
SSA is impossible for sleeping beauty to use, because using it properly requires knowing if there are 1 or 2 waking beauties, which requires knowing the problem under consideration. The same problem would come up in any anthropic problem. As the answer to the question determines the set of SSA, the set of SSA cannot be a tool used in calculating the probabilities of different answers.
Depends on what you mean by “use.” If you mean “use as a mysterious process that outputs probabilities,” then you’re right, it’s unusable. But if you mean “use as a set of starting information,” there is no problem.
I mean use as part of any process to determine probabilities of an anthropic problem. Mysterious or not. How can she use it as a set of starting information?
I may be misinterpreting, but to use either requires the identification of the set of items being considered. If I’m wrong, can you walk me through how sleeping beauty would consider her problem using SSA as her set of starting information?
Hm.
You’re sort of right, because remember the Sweden problem. When we asked “what is the probability that I live in Sweden,” using SSA, we didn’t consider alternate earths. And the reason we didn’t consider alternate earths is because we used the information that Sweden exists, and is a country in europe, etc. We made our reference class “humans on this earth.” But if you try to pull those same shenanigans with Sleeping Beauty (if we use the problem statement where there’s a copy of her) and make the reference class “humans who have my memories” you just get an “ERROR = DON’T HAVE COMPLETE INFORMATION ABOUT THIS REFERENCE CLASS.”
But what do you do when you have incomplete information? You use probabilities! So you get some sort of situation where you know that P(copy 1 | tails) = P(copy 2 | tails), but you don’t know about P(heads) and P(tails). And, hm, I think knowing that you’re an observer that exists includes some sneaky connotation about mutual exclusivity and exhaustiveness of all your options.
Personally, I think saying there’s “no particular asymmetry” is dangerous to the point of being flat out wrong. The three possibilities don’t look the least bit symmetric to me, they’re all qualitatively quite different. There’s no “relevant”, asymmetry but how exactly do we know what’s relevant and what’s not? Applying symmetry in places it shouldn’t be applied is the key way in which people get these things wrong. The fact that it gives the right answer this time is no excuse.
So my challenge to you is, explain why the answer is 2⁄3 without using the word “symmetry”.
Here’s my attempt: Start with a genuinely symmetric (prior) problem, then add the information. In this case, the genuinely symmetric problem is “It’s morning. What day is it and will/did the coin come up heads?”, while the information is “She just woke up, and the last thing she remembers is starting this particular bizzare coin/sleep game”. In the genuinely symmetric initial problem all days are equally likely and so are both coin flips. The process for applying this sort of additional information is to eliminate all scenarios that it’s inconsistent with, and renormalise what’s left. The information eliminates all possibilities except (Monday, heads), (Monday, tails), (Tuesday, tails) - and some more obscure possibilities of (for the sake of argument) negligable weight. These main three had equal weight before and are equally consistent with the new information so they have equal weight now.
Ok, I did use the word symmetry in there but only describing a different problem where it was safe. It’s still not the best construction because my initial problem isn’t all that well framed, but you get the idea.
Note that more generally you should ask for p(new information | scenario) and apply Bayes Rule, but anthropic-style information is a special case where the value of this is always either 0 or 1. Either it’s completely inconsistent with the scenario or guaranteed by it. That’s what leads to the simpler process I describe above of eliminating the impossible and simply renormalising what remains.
The good thing about doing it this way is that you can also get the exact answer for the case where she knows the coin is biased to land heads 52% of the time, where any idea that the scenario is symmetric is out the window.
:D
But it’s not entirely special, which is interesting. For example, say it’s 8:00 and you have two buckets and there’s one ball in one of the buckets. You have a 1⁄2 chance of getting the ball if you pick a bucket. Then, at exactly 8:05, you add another bucket and mix up the ball. Now you have a 1⁄3 chance of getting the ball if you pick a bucket.
But what does Bayes’ rule say? Well, P(get the ball | you add a third bucket) = P(get the ball) * P(you add a third bucket | get the ball) / P(you add a third bucket). Since you always add a third bucket whether you get the ball or not, it seems the update is just 1/1=1, so adding a third bucket doesn’t change anything. I would claim that this apparent failure of Bayes’ rule (failure of interpreting it, more likely) is analogous to the apparent failure of Bayes’ rule in the sleeping beauty problem. But I’m not sure why either happens, or how you’d go about fixing the problem.
I’m yet to see how either the SSA or the SIA thinking can be instrumentally useful without reframing the SB problem in a way that lets her achieve a goal other than spitting out a useless number. Once you reformulate the problem in a way that the calculated number affects her actual survival odds, the SSA vs SIA musings quickly disappear.
Does anyone know of any more detailed discussions of the Adrian Kent paper?
I suppose my version is somewhere between SSA and SIA. It’s “All other things equal, an observer should reason as if they are randomly selected from the set of all actually existent observers (past, present and future)”.
I accept timeless physics, so I guess there’d be observers sideways in time too, but that’s not the point.
What is a “reference class” anyway?
The reference class is the collection of things you pretend you could be any one of with equal probability. To specify a reference class (e.g., “humans”), you just need a piece of information (“I am a human”).
But then it depends on the reference class you choose. For example, if you choose “animals” and then update on being a human, you will conclude that a higher proportion of animals are humans than if you choose “humans” to begin with. If you get different results from processing the same information two different ways, at least one of them must be wrong.
Right. The trick is that choosing “animals” should be equivalent to having a certain piece of information. To get different reference classes, there has to be something you know that gives you “I’m a human” instead of “I’m a dog! Woof!”. If you neglect this, you can (and did) derive contradictory stuff.
I don’t understand. I have the information “I am an animal” and “I am a human”. If I start with “I am an animal” and update with “I am a human”, I get something different than if I start with “I am a human” and update with “I am an animal”. How do I get the correct answer?
It seems to me that you’d have to start with “I am conscious”, and then update with everything.
Why do you end up with something different if you update in a different order? If you want a way to get the correct answer, work out why you do that and stop doing it!
I’d say it’s because I should be updating in both cases, rather than starting with “I am an animal” or “I am a human”. I should start with “I am conscious”, because I can’t not be, and then update from there.
I’m trying to show that picking reference classes arbitrarily leads to a contradiction, so SSA, as currently stated, doesn’t work. If it does, what other solution is there to that paradox?
Manfred, thanks for this post, and for the clarifications below.
I wonder how your approach works if the coin is potentially biased, but the bias is unknown? Let’s say it has probability p of Tails, using the relative frequency sense that p is the frequency of Tails if tossed multiple times. (This also means that in multiple repetitions a fraction 2p / (1 + p) Beauty awakenings are after Tails, and a fraction 1 / (1 + p) Beauty awakenings are on Mondays.)
Beauty has to estimate the parameter p before betting, which means in Bayesian terms she has to construct a subjective distribution over possible values of p.
Before going to sleep, what should her distribution look like? One application of the indifference principle is that she has no idea about p except that it is somewhere between 0 and 1, so her subjective distribution of p should be uniform on [0, 1].
When she wakes up, should she adjust her distribution of p at all, or is it still the same as at step 1?
Suppose she’s told that it is Monday before betting. Should she update her distribution towards lower values of p, because these would give her higher likelihood of finding out it’s Monday?
If the answer to 3 is “yes” then won’t that have implications for the Doomsday Argument as well? (Consider the trillion Beauty limit, where there will be a trillion awakenings if the coin fell Tails. In that case, the fraction of awakenings which are “first” awakenings—on the Monday right after the coin-toss—is about 1/(1 + 10^12 x p). Now suppose that Beauty has just discovered she’s in the first awakening… doesn’t that force a big shift in her distribution towards p close to zero?)
The way I formulated the problem, this is how it is already :) If you wanted a “known fair” coin, you’d need some information like “I watched this coin come up infinity times and it had a heads:tails ratio of 1:1.” Instead, all Beauty gets is the information “the coin has two mutually exclusive and exhaustive sides.”
This is slightly unrealistic, because in reality coins are known to be pretty fair (if the flipper cooperates) from things like physics and the physiology of flipping. But I think a known fair coin would make the problem more confusing, because it would make it more intuitive to pretend that the probability is a property of the coin, which would give you the wrong answer.
Anyhow, you’ve got it pretty much right. Uniform distribution, updated by P(result | coin’s bias), can give you a picture of a biased coin, unlike if the coin was known fair. However, if “result” is that you’re the first awakening, the update is proportional to P(Monday | coin’s bias), since being the first awakening is equivalent to saying you woke up on Monday. But notice that you always wake up on Monday, so it’s a constant, so it doesn’t change the average bias of the coin.
This is interesting, and I’d like to understand exactly how the updating goes at each step. I’m not totally sure myself, which is why I’m asking the question about what your approach implies.
Remember Beauty now has to update on two things: the bias of the coin (the fraction p of times it would fall Tails in many throws) and whether it actually fell Tails in the particular throw. So she has to maintain a subjective distribution over the pair of parameters (p, Heads|Tails).
Step 1: Assuming an “ignorant” prior (no information about p except that is between 0 and 1) she has a distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r for all values of r between 0 and 1. This gives P[Tails] = 1⁄2 by integration.
Step 2: On awakening, does she update her distribution of p, or of the probability of Tails given that p=r? Or does she do both?
It seems paradoxical that the mere fact of waking up would cause her to update either of these. But she has to update something to allow her to now set P[Tails] = 2⁄3. I’m not sure exactly how she should do it, so your views on that would be helpful.
One approach is to use relative frequency again. Assume the experiment is now run multiple times, but with different coins each time, and the coins are chosen from a huge pile of coins having all biases between zero and one in “equal numbers”. (I’m not sure this makes sense, partly because p is a continuous variable, and we’ll need to approximate it by a discrete variable to get the pile to have equal numbers; but mainly because the whole approach seems contrived. However, I will close my eyes and calculate!)
The fraction of awakenings after throwing a coin with bias p becomes proportional to 1 + p. So after normalization, the distribution of p on awakening should shift to (2/3)(1 + p). Then, given that a coin with bias p is thrown, the fraction of awakenings after Tails is 2p / (1 + p), so the joint distribution after awakening is P[p = r & Tails] = (4/3)r, and P[p = r & Heads] = (2/3)(1 - r), which when integrating again gives P[Tails] = 2⁄3.
Step 3: When Beauty learns it is Monday what happens then? Well her evidence (call it “E”) is that”I have been told that it is Monday today” (or “This awakening of Beauty is on Monday” if you want to ignore the possible complication of untruthful reports). Notice the indexical terms.
Continuing with the relative frequency approach (shut up and calculate again!) Beauty should set P[E|p = r] = 1/(1+r) since if a coin with bias r is thrown repeatedly, that becomes the fraction of all Beauty awakenings which will learn that “today is Monday”. So the evidence E should indeed shift Beauty’s distribution on p towards lower values of p (since they assign higher probability to the evidence E). However, all the shift is doing here is to reverse the previous upward shift at Step 2.
More formally, we have P[E & p = r] proportional to 1/(1 + r) x (1 + r) and the factors cancel out, so that p[E & p = r] is a constant in r. Hence P[p = r | E] is also a constant in r, and we are back to the uniform distribution over p. Filling in the distribution in the other variable, we get P[Tails | E & p = r] = r. Again look at relative frequencies: if a coin with bias r is thrown repeatedly, then among the Monday-woken Beauties, a fraction r of them will be woken after Tails. So we are back to the original joint distribution P[p = r & Tails] = r, P[p = r & Heads] = 1 - r, and again P[Tails] = 1⁄2 by integration.
After all that work, the effect of Step 2 is very like applying an SIA shift (Bias to Tails is deemed more likely, because that results in more Beautiful experiences) and the effect of Step 3 is then like applying an SSA shift (Heads-bias is more likely, because that makes it more probable that a randomly-selected Beautiful experience is a Monday-experience). The results cancel out. Churning through the trillion-Beauty case will give the same effect, but with bigger shifts in each direction; however they still cancel out.
The application to the Doomsday Argument is that (as is usual given the application of SIA and SSA together) there is no net shift towards “Doom” (low probability of expanding, colonizing the Galaxy with a trillion trillion people and so on). This is how I think it should go.
However, as I noted in my previous comments, there is still a “Presumptuous Philosopher” effect when Beauty wakes up, and it is really hard to justify this if the relative frequencies of different coin weights don’t actually exist. You could consider for instance that Beauty has different physical theories about p: one of those theories implies that p = 1⁄2 while another implies that p = 9⁄10. (This sounds pretty implausible if a coin, but if the coin-flip is replaced by some poorly-understood randomization source like a decaying Higgs Boson, then this seems more plausible). Also, for the sake of argument, both theories imply infinite multiverses, so that there are just as many Beautiful awakenings—infinitely many—in each case.
How can Beauty justify believing the second theory more, simpy because she has just woken up, when she didn’t believe it before going to sleep? That does sound really Presumptuous!
A final point is that SIA tends to cause problems when there is a possibility of an infinite multiverse, and—as I’ve posted elsewhere—it doesn’t actually counter SSA in those cases, so we are still left with the Doomsday Argument. It’s a bit like refusing to shift towards “Tails” at Step 2 (there will be infinitely many Beauty awakenings for any value of p, so why shift? SIA doesn’t tell us to), but then shifting to “Heads” after Step 3 (if there is a coin bias towards Heads then most of the Beauty-awakenings are on Monday, so SSA cares, and let’s shift). In the trillion-Beauty case, there’s a very big “Heads” shift but without the compensating “Tails” shift.
If your approach can recover the sorts of shift that happen under SIA+SSA, but without postulating either, that is a bonus, since it means we don’t have to worry about how to apply SIA in the infinite case.
So what does Bayes’ theorem tell us about the Sleeping Beauty case?
It says that P(B|AC) = P(B|C) * P(A|BC)/P(A|C). In this case C is sleeping beauty’s information before she wakes up, which is there for all the probabilities of course. A is the “anthropic information” of waking up and learning that what used to be “AND” things are now mutually exclusive things. B is the coin landing tails.
Bayes’ theorem actually appears to break down here, if we use the simple interpretation of P(A) as “the probability she wakes up.” Because Sleeping Beauty wakes up in all the worlds, this interpretation says P(A|C) = 1, and P(A|BC) = 1, and so learning A can’t change anything.
This is very odd, and is an interesting problem with anthropics (see eliezer’s post “The Anthropic Trilemma”). The practical but difficult-to-justify way to fix it is to use frequencies, not probabilities—because she can have a average frequency of waking up of 2 or 3⁄2, while probabilities can’t go above 1.
But the major lesson is that you have to be careful about applying Bayes’ rule in this sort of situation—if you use P(A) in the calculation, you’ll get this problem.
Anyhow, only some of this a response to anything you wrote, I just felt like finishing my line of thought :P Maybe I should solve this...
Thanks… whatever the correct resolution is, violating Bayes’s Theorem seems a bit drastic!
My suspicion is that A contains indexical evidence (summarized as something like “I have just woken up as Beauty, and remember going to sleep on Sunday and the story about the coin-toss”). The indexical term likely means that P[A] is not equal to 1 though exactly what it is equal to is an interesting question.
I don’t personally have a worked-out theory about indexical probabilities, though my latest WAG is a combination of SIA and SSA, with the caveat I mentioned on infinite cases not working properly under SIA. Basically I’ll try to map it to a relative frequency problem, where all the possibilities are realised a large but finite number of times, and count P[E] as the relative frequency of observations which contain evidence E (including any indexical evidence), taking the limit where the number of observations increases to infinity. I’m not totally satisfied with that approach, but it seems to work as a calculational tool.
I may be confused, but it seems like Beauty would have to ask “Under what conditions am I told ‘It’s Monday’?” to answer question 3.
In other problems, when someone is offering you information, followed by a chance to make a decision, if you have access to the conditions under which they decided to offer you that information should be used as information to influence your decision. As an example, the other host behaviors in the Monty Hall problem. mention that point, and it seems likely they would in this case as well.
If you have absolutely no idea under what circumstances they decided to offer that information, then I have no idea how you would aggregate meaning out of the information, because there appear to be a very large number of alternate theories. For instance:
1: If Beauty is connected to a random text to speech generator, which happens to randomly text to speech output “Smundy”, Beauty may have misheard nonsensical gibberish as “It’s Monday.”
2: Or perhaps it was intentional and trying to be helpful, but actually said “Es Martes” because it assumed you were a Spanish speaking rationalist, and Beauty just heard it as “It’s Monday.” when Beauty should have processed “It’s Tuesday.” which would cause Beauty to update the wrong way.
3: Or perhaps it always tells Beauty the day of the week, but only on the first Monday.
4: Or perhaps it always tells Beauty the day of the week, but only if Beauty flips tails.
5: Or perhaps it always tells Beauty the day of the week, but only if Beauty flips heads.
6: Or perhaps it always tells Beauty the day of the week on every day of the puzzle, but doesn’t tell Beauty whether it is the “first” Monday on Monday.
7: It didn’t tell Beauty anything directly. Beauty happened to see a calendar when it opened the door and it appears to have been entirely unintentional.
Not all of these would cause Beauty to adjust the distribution of P in the same way. And they aren’t exhaustive, since there are far more then these 7. Some may be more likely than others, but if Beauty don’t have any understanding about which would be happening when, Beauty wouldn’t know which way to update P, and if Beauty did have an understanding, Beauty would presumably have to use that understanding.
I’m not sure whether this insightful, or making it more confused then it needs to be.
OK, fair enough—I didn’t specify how she acquired that knowledge, and I wasn’t assuming a clever method. I was just considering a variant of the story (often discussed in the literature) where Beauty is always truthfully told the day of the week after choosing her betting odds, to see if she then adjusts her betting odds. (And to be explicit, in the trillion Beauty story, she’s always told truthfully whether she’s the first awakening or not, again to see if she changes her odds). Is that clearer?
Yes, I wasn’t aware “Truthfully tell on all days” was a standard assumption for receiving that information, thank you for the clarification.
It’s OK.
The usual way this applies is in the standard problem where the coin is known to be unbiased. Typically, a person arguing for the 2⁄3 case says that Beauty should shift to 1⁄2 on learning it is Monday. Whereas a critic originally arguing for the 1⁄2 case says that Beauty should shift to 1⁄3 for Tails (2/3 for Heads) on learning it is Monday.
The difficulty is that both those answers give something very presumptuous in the trillion Beauty limit (near certainty of Tails before the shift, or near certainty of Heads after the shift).
Nick Bostrom has argued for a “hybrid” solution which avoids the shift, but on the face of things looks inconsistent with Bayesian updating. But the idea is that Beauty might be in a different “reference class” before and after learning the day.
See http://www.fhi.ox.ac.uk/__data/assets/pdf_file/0011/5132/sleeping_beauty.pdf or http://www.nickbostrom.com/ (Right hand column, about halfway down the page).
It looks like paragraphs 3--5 of “Thinking with Information” (starting with “Next, what should Sleeping Beauty”) are in the wrong place.
The problem with the Sleeping Beauty Problem (irony intended), is that it belongs more in the realm of philosophy and/or logic, than mathematics. The irony in that (double-irony intended), is that the supposed paradox is based on a fallacy of logic. So the people who perpetuate it should be best equipped to resolve it. Why they don’t, or can’t, I won’t speculate about.
Mathematicians, Philosophers, and Logicians all recognize how information introduced into a probability problem allows one to update the probabilities based on that information. The controversy in the Sleeping Beauty Problem is based on the fallacious conclusion that such “new” information is required to update probabilities this way. This is an example of the logical fallacy called affirming the consequent: concluding that “If A Then B” means “A is required to be true for B to be true” (an equivalent statement is “If B then A”).
All that is really needed for updating, is a change in the information. It almost always is an addition, but in the Sleeping Beauty Problem it is a removal. Sunday Sleeping Beauty (SSB) can recognize that “Tails & Awake on Monday” and “Tails & Awake on Tuesday” represent the same future (Manfred’s “AND”), both with prior probability 1⁄2. But Awakened Sleeping Beauty (ASB), who recognizes only the present, must distinguish these two outcomes as being distinct (Manfred’s “OR”). This change in information allows Bayes’ Rule to be applied in a seemingly unorthodox way: P(H&AonMO|A) = P(H&AonMO)/[P(H&AonMO) + P(T&AonMO) + P(T&AonTU)] = (1/2)/(1/2+1/2+1/2) = 1⁄3. The denominator in this expression is greater than 1 because the change (not addition) of information separates non-disjoint events into disjoint events.
The philosophical issue about SSA v. SIA (or whatever these people call them; I haven’t seen any two who define them agree), can be demonstrated by the “Cloned SB” variation. That’s where, if Tails is flipped, an independent copy of SB is created instead of two awakenings happening. Each instance of SB will experience only one “awakening,” so the separation of one prior event into two disjoint posterior events, as represented by “OR,” does not occur. But neither does “AND.” We need a new one called “ONE OF.” This way, Bayes’ Rule says P(H&Me on Mo) = P(H&Me on MO)/[P(H&Me on MO) + (ONE OF P(T&Me on MO), P(T&Me on TU))] = (1/2)/(1/2+1/2) = 1⁄2.
The only plausible controversy here is how SB should interpret herself: as one individual who might be awakened twice during the experiment, or as one of the two who might exist in it. The former leads to a credence of 1⁄3, and he latter leads to a credence of 1⁄2. But the latter does not follow from the usual problem statement.
Thank you, great post!
Is this actually true? I always understood coins to be unaffected by quantum fluctuations.