Doesn’t this depend entirely on what decisions are good?
Like, let’s say that you decide to incentivize Beauty to guess correctly. The way you’re going to do this is as follows: each time Beauty is woken, you ask her how the coin landed. If she guessed right, you give her a prize immediately (the same prize each time; let’s say, $1).
Now let’s leave probabilities out of it and consider only possible scenarios:
Beauty guesses heads, always.
Coin landed heads (and it’s a Monday); Beauty wins $1.
Coin landed tails (and it’s a Monday); Beauty wins $0.
Coin landed tails (and it’s a Tuesday); Beauty wins $0.
Total winnings across all scenarios: $1. Average winnings per experiment iteration, if repeated: $0.50.
Beauty guesses tails, always.
Coin landed heads (and it’s a Monday); Beauty wins $0.
Coin landed tails (and it’s a Monday); Beauty wins $1.
Coin landed tails (and it’s a Tuesday); Beauty wins $1.
Total winnings across all scenarios: $2. Average winnings per experiment iteration, if repeated: $1.
Aha! Beauty wins twice as much money by guessing tails than heads. Clearly, the thirder position is correct.
But wait! What if we change the incentive structure? Now, instead of rewarding Beauty on each day, we instead reward her at the end of the experiment, if and only if she guessed right each time she was woken. Let’s consider the scenarios:
Beauty guesses heads, always.
Coin landed heads (and it’s a Monday); Beauty guesses heads (1/1 correct answers), and wins $1 at the end.
Coin landed tails (and it’s a Monday); Beauty guesses heads (0/2 correct answers), and so will now win $0 regardless of what she guesses on Tuesday.
Coin landed tails (and it’s a Tuesday); Beauty guesses heads (but this is now irrelevant).
Total winnings across all scenarios: $1. Average winnings per experiment iteration, if repeated: $0.50.
Beauty guesses tails, always.
Coin landed heads (and it’s a Monday); Beauty guesses tails (0/1 correct answers), and wins nothing.
To put it another way: the reason that Beauty should guess tails in my first scenario above, is that we’re rewarding her twice as much for correctly guessing tails than for correctly guessing heads! It’s just exactly the same thing as if I offered you a reward for guessing the outcome of a simple coin flip, and paid you $2 if the coin landed tails (and you guessed right), but only $1 if it landed heads (and you guessed right). Of course you should guess tails!
Your second scenario introduces a coordination issue, since Beauty gets nothing if she guesses differently on Monday and Tuesday. I’m still thinking about that.
Or you can say that the payoff for guessing Tails correctly is $0.50 while guessing Heads correctly gives $1.00, so the total payoff is the same from always guessing Heads as from always guessing Tails. In that case, you can see that you get indifference to Heads versus Taills when the probability of Heads is 1⁄3, by computing the expected return for guessing Heads at one particular time as (1/3) 1.00 versus the expected return for guessing Tails at one particular time of (2/3) 0.5. Clearly you don’t get indifference if Beauty thinks the probability of Heads is 1⁄2.
Your second scenario introduces a coordination issue, since Beauty gets nothing if she guesses differently on Monday and Tuesday.
I guess I’m not sure why this is an “issue”, or what exactly it means to say that it’s a “coordination issue”. The point I am making is that we can say: “if the researchers offer reward structure X, then the answer that makes sense is Y”. For the first reward structure I described, either answer is equally profitable for Beauty. For the second reward structure I described, “tails” is the more profitable answer.
Saying “let’s instead make the reward structure different in this-and-such way” misses the point. For any given reward structure, there is some way for Beauty to answer that maximizes her reward. For a different reward structure, the best answer might be something else. That’s all.
(All of this is described, and even better than I’m doing, in this old post that is linked from the OP.)
If you eliminate that issue by saying that only Monday guesses count, or that only the last guess counts, you’ll find that Beuaty has to assign probability 1⁄3 to Heads in order to do the right thing by using standard decision theory.
Right, this is an example of what I’m saying: change the reward structure, and the profit-maximizing answer may well change. None of this seems to motivate the idea that there’s some answer which is simple “correct”, in a way that’s divorced from some goal you’re trying to accomplish, some reason why you care about the answer, etc. (Decision theory without any value/goals/rewards is nonsense, after all!)
“But,” says the causal decision theorist, “to take only one box, you must somehow believe that your choice can affect whether box B is empty or full—and that’s unreasonable! Omega has already left! It’s physically impossible!”
Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B.
Similarly, Sleeping Beauty doesn’t have to answer “tails” because she believes that the probability of tails is 1⁄3. She can just… answer “tails”. Beauty is not obligated to believe anything whatsoever.
And to quote the post linked in the parent comment:
But in the original problem, when she is asked “What is your credence now for the proposition that our coin landed heads?”, a much better answer than “.5” is “Why do you want to know?”. If she knows how she’s being graded, then there’s an easy correct answer, which isn’t always .5; if not, she will have to do her best to guess what type of answer the experimenters are looking for; and if she’s not being graded at all, then she can say whatever the hell she wants (acceptable answers would include “0.0001,” “3⁄2,” and “purple”).
The linked post by ata is simply wrong. It presents the scenario where
Each interview consists of Sleeping Beauty guessing whether the coin came up heads or tails. After the experiment, she will be given a dollar if she was correct on Monday.
In this case, she should clearly be indifferent (which you can call “.5 credence” if you’d like, but it seems a bit unnecessary).
But this is not correct. If you work out the result with standard decision theory, you get indifference between guessing Heads or Tails only if Beauty’s subjective probability of Heads is 1⁄3, not 1⁄2.
You are of course right that anyone can just decide to act, without thinking about probabilities, or decision theory, or moral philosophy, or anything else. But probability and decision theory have proven to be useful in numerous applications, and the Sleeping Beauty problem is about probability, presumably with the goal of clarifying how probability works, so that we can use it in practice with even more confidence. Saying that she could just make a decision without considering probabilities rather misses the point.
If you work out the result with standard decision theory, you get indifference between guessing Heads or Tails only if Beauty’s subjective probability of Heads is 1⁄3, not 1⁄2.
I don’t know about “standard decision theory”, but it seems to me that—in the described case (only Monday’s answer matters)—guessing Heads yields an average of 50¢ at the end, and guessing Tails also yields an average of 50¢ at the end. I don’t see that Beauty has to assign any credences or subjective probabilities to anything in order to deduce this.
As the linked post says, you can call this “0.5 credence”. But, if you don’t want to, you can also not call it “0.5 credence”. You don’t have to call it anything. You can just be indifferent.
You are of course right that anyone can just decide to act, without thinking about probabilities, or decision theory, or moral philosophy, or anything else.
My point is that “just deciding to act” in this case actually gets us the result that we want. Saying that probability and decision theory are “useful” is beside the point, since we already have the answer we actually care about, which is: “what do I [Sleeping Beauty] say to the experimenters in order to maximize my profit?”
But the thing is you can’t call it “0.5 credence” and have your credence be anything like a normal probability. The Halfer will assign probability 1⁄2 for Heads and Monday, 1⁄4 for Tails and Monday, and 1⁄4 for Tails and Tuesday. Since only the guess on Monday is relevant to the payoff, we can ignore the Tuesday possibility (in which the action taken has no effect on the payoff), and see that a halfer would have a 2:1 preference for Heads. In contrast, a Thirder would give 1⁄3 probability to Heads and Monday, 1⁄3 to Tails and Monday, and 1⁄3 to Tails and Tuesday. Ignoring Tuesday, they’re indifferent between guessing Heads or Tails.
With a slight tweak to payoffs so that Tails are slightly more rewarding, the Halfer will make a definitely wrong decision, while the Thirder will make the right decision.
We agree about what the right actions are for the various reward structures. We can then try to work backwards from what the right action is to what probability Beauty should assign to the coin landing Heads after being wakened, in order that this probability will lead (by standard decision theory) to her taking the action we’ve decided is the correct one.
For your second scenario, Beauty really has to commit to what to do before the experiment, which means this scheme of working backwards from correct decision to probability of Heads after wakening doesn’t seem to work. Guessing either Heads or Tails is equally good, but only if done consistently. Deciding after each wakening without having thought about it beforehand doesn’t work well, since with the two possibilities being equally good, Beauty might choose differently on Monday and Tuesday, with bad results. Now, if the problem is tweaked with slightly different rewards for guessing Heads correctly than Tails correctly, we can avoid the situation of both guesses being equally good. But the coordination problem still seems to confuse the issue of how to work backwards to the appropriate probabilities (for me at least).
I think it ought to be the case that, regardless of the reward structure, if you work backwards from correct action to probabilities, you get that Beauty after wakening should give probability 1⁄3 to Heads. That seems to be what happens for all the reward structures where Beauty can decide what to do each day without having to know what she might do or have done the other day.
We can then try to work backwards from what the right action is to what probability Beauty should assign to the coin landing Heads after being wakened, in order that this probability will lead (by standard decision theory) to her taking the action we’ve decided is the correct one.
Is there some reason why you’re committed to standard (by which you presumably mean, causal—or what?) decision theory, when approaching this question? After all:
For your second scenario, Beauty really has to commit to what to do before the experiment
As I understand it, UDT (or some similar decision theory) is the now-standard solution for such dilemmas.
I think it ought to be the case that, regardless of the reward structure, if you work backwards from correct action to probabilities, you get that Beauty after wakening should give probability 1⁄3 to Heads.
Why, though? More importantly, why does it matter?
It seems to me that all that Beauty needs to know, given that that the scenario (one or two awakenings) is chosen by the flip of a fair coin, is that fair coins land heads half the time, and tails the other half. I really don’t see any reason why we should insist on there being some single, “objectively correct”, subjective probability assignment over the outcomes, that has to hold true for all formulations of this thought experiment, and/or all other Sleeping-Beauty-esque scenarios, etc.
In other words:
We agree about what the right actions are for the various reward structures.
I am struggling to see why there should be anything more to the matter than this. We all agree what the right actions are and we are all equally quite capable of determining what those right actions are. It seems to me that we’re done.
A big reason why probability (and belief in general) is useful is that it separates our observations of the world from our decisions. Rather than somehow relating every observation to every decision we might sometime need to make, we instead relate observations to our beliefs, and then use our beliefs when deciding on actions. That’s the cognitive architecture that evolution has selected for (excepting some more ancient reflexes), and it seems like a good one.
I don’t really disagree, per se, with this general point, but it seems strange to insist on rejecting an answer we already have, and already know is right, in the service of this broad point. If you want to undertake the project of generalizing and formalizing the cognitive algorithms that led us to the right answer, fine and well, but in no event should that get in the way of clarity w.r.t. the original question.
Again: we know the correct answer (i.e. the correct action for Beauty to take); and we know it differs depending on what reward structure is on offer. The question of whether there is, in some sense, a “right answer” even if there are no rewards at all, seems to me to be even potentially useful or interesting only in the case that said “right answer” does in fact generate all the practical correct answers that we already have. (And then we can ask whether it’s an improvement on whatever algorithm we had used to generate said right answers, etc.)
Well of course. If we know the right action from other reasoning, then the correct probabilities better lead us to the same action. That was my point about working backwards from actions to see what the correct probabilities are. One of the nice features about probabilities in “normal” situations is that the probabilities do not depend on the reward structure. Instead we have a decision theory that takes the reward structure and probabilities as input and produces actions. It would be nice if the same nice property held in SB-type problems, and so far it seems to me that it does.
I don’t think there has ever been much dispute about the right actions for Beauty to take in the SB problem (i.e., everyone agrees about the right bets for Beauty to make, for whatever payoff structure is defined). So if just getting the right answer for the actions was the goal, SB would never have been considered of much interest.
Doesn’t this depend entirely on what decisions are good?
Like, let’s say that you decide to incentivize Beauty to guess correctly. The way you’re going to do this is as follows: each time Beauty is woken, you ask her how the coin landed. If she guessed right, you give her a prize immediately (the same prize each time; let’s say, $1).
Now let’s leave probabilities out of it and consider only possible scenarios:
Beauty guesses heads, always.
Coin landed heads (and it’s a Monday); Beauty wins $1.
Coin landed tails (and it’s a Monday); Beauty wins $0.
Coin landed tails (and it’s a Tuesday); Beauty wins $0.
Total winnings across all scenarios: $1. Average winnings per experiment iteration, if repeated: $0.50.
Beauty guesses tails, always.
Coin landed heads (and it’s a Monday); Beauty wins $0.
Coin landed tails (and it’s a Monday); Beauty wins $1.
Coin landed tails (and it’s a Tuesday); Beauty wins $1.
Total winnings across all scenarios: $2. Average winnings per experiment iteration, if repeated: $1.
Aha! Beauty wins twice as much money by guessing tails than heads. Clearly, the thirder position is correct.
But wait! What if we change the incentive structure? Now, instead of rewarding Beauty on each day, we instead reward her at the end of the experiment, if and only if she guessed right each time she was woken. Let’s consider the scenarios:
Beauty guesses heads, always.
Coin landed heads (and it’s a Monday); Beauty guesses heads (1/1 correct answers), and wins $1 at the end.
Coin landed tails (and it’s a Monday); Beauty guesses heads (0/2 correct answers), and so will now win $0 regardless of what she guesses on Tuesday.
Coin landed tails (and it’s a Tuesday); Beauty guesses heads (but this is now irrelevant).
Total winnings across all scenarios: $1. Average winnings per experiment iteration, if repeated: $0.50.
Beauty guesses tails, always.
Coin landed heads (and it’s a Monday); Beauty guesses tails (0/1 correct answers), and wins nothing.
Coin landed tails (and it’s a Monday); Beauty guesses tails (1/2 correct answers).
Coin landed tails (and it’s a Tuesday); Beauty guesses tails (2/2 correct answers), and wins $1.
Total winnings across all scenarios: $1. Average winnings per experiment iteration, if repeated: $0.50.
Now Beauty wins the same amount of money by guessing heads as by guessing tails. Clearly, the halfer position is correct…?
To put it another way: the reason that Beauty should guess tails in my first scenario above, is that we’re rewarding her twice as much for correctly guessing tails than for correctly guessing heads! It’s just exactly the same thing as if I offered you a reward for guessing the outcome of a simple coin flip, and paid you $2 if the coin landed tails (and you guessed right), but only $1 if it landed heads (and you guessed right). Of course you should guess tails!
Your second scenario introduces a coordination issue, since Beauty gets nothing if she guesses differently on Monday and Tuesday. I’m still thinking about that.
If you eliminate that issue by saying that only Monday guesses count, or that only the last guess counts, you’ll find that Beuaty has to assign probability 1⁄3 to Heads in order to do the right thing by using standard decision theory. The details are in my comment on the post at https://www.lesswrong.com/posts/u7kSTyiWFHxDXrmQT/sleeping-beauty-resolved#aG739iiBci9bChh5D
Or you can say that the payoff for guessing Tails correctly is $0.50 while guessing Heads correctly gives $1.00, so the total payoff is the same from always guessing Heads as from always guessing Tails. In that case, you can see that you get indifference to Heads versus Taills when the probability of Heads is 1⁄3, by computing the expected return for guessing Heads at one particular time as (1/3) 1.00 versus the expected return for guessing Tails at one particular time of (2/3) 0.5. Clearly you don’t get indifference if Beauty thinks the probability of Heads is 1⁄2.
I guess I’m not sure why this is an “issue”, or what exactly it means to say that it’s a “coordination issue”. The point I am making is that we can say: “if the researchers offer reward structure X, then the answer that makes sense is Y”. For the first reward structure I described, either answer is equally profitable for Beauty. For the second reward structure I described, “tails” is the more profitable answer.
Saying “let’s instead make the reward structure different in this-and-such way” misses the point. For any given reward structure, there is some way for Beauty to answer that maximizes her reward. For a different reward structure, the best answer might be something else. That’s all.
(All of this is described, and even better than I’m doing, in this old post that is linked from the OP.)
Right, this is an example of what I’m saying: change the reward structure, and the profit-maximizing answer may well change. None of this seems to motivate the idea that there’s some answer which is simple “correct”, in a way that’s divorced from some goal you’re trying to accomplish, some reason why you care about the answer, etc. (Decision theory without any value/goals/rewards is nonsense, after all!)
Edit: Corrected quoting failure
Here’s another reframing of my point—borrowing from one of my favorite essays in the Sequences, “Newcomb’s Problem and Regret of Rationality”, where Eliezer says:
Similarly, Sleeping Beauty doesn’t have to answer “tails” because she believes that the probability of tails is 1⁄3. She can just… answer “tails”. Beauty is not obligated to believe anything whatsoever.
And to quote the post linked in the parent comment:
The linked post by ata is simply wrong. It presents the scenario where
But this is not correct. If you work out the result with standard decision theory, you get indifference between guessing Heads or Tails only if Beauty’s subjective probability of Heads is 1⁄3, not 1⁄2.
You are of course right that anyone can just decide to act, without thinking about probabilities, or decision theory, or moral philosophy, or anything else. But probability and decision theory have proven to be useful in numerous applications, and the Sleeping Beauty problem is about probability, presumably with the goal of clarifying how probability works, so that we can use it in practice with even more confidence. Saying that she could just make a decision without considering probabilities rather misses the point.
I don’t know about “standard decision theory”, but it seems to me that—in the described case (only Monday’s answer matters)—guessing Heads yields an average of 50¢ at the end, and guessing Tails also yields an average of 50¢ at the end. I don’t see that Beauty has to assign any credences or subjective probabilities to anything in order to deduce this.
As the linked post says, you can call this “0.5 credence”. But, if you don’t want to, you can also not call it “0.5 credence”. You don’t have to call it anything. You can just be indifferent.
My point is that “just deciding to act” in this case actually gets us the result that we want. Saying that probability and decision theory are “useful” is beside the point, since we already have the answer we actually care about, which is: “what do I [Sleeping Beauty] say to the experimenters in order to maximize my profit?”
But the thing is you can’t call it “0.5 credence” and have your credence be anything like a normal probability. The Halfer will assign probability 1⁄2 for Heads and Monday, 1⁄4 for Tails and Monday, and 1⁄4 for Tails and Tuesday. Since only the guess on Monday is relevant to the payoff, we can ignore the Tuesday possibility (in which the action taken has no effect on the payoff), and see that a halfer would have a 2:1 preference for Heads. In contrast, a Thirder would give 1⁄3 probability to Heads and Monday, 1⁄3 to Tails and Monday, and 1⁄3 to Tails and Tuesday. Ignoring Tuesday, they’re indifferent between guessing Heads or Tails.
With a slight tweak to payoffs so that Tails are slightly more rewarding, the Halfer will make a definitely wrong decision, while the Thirder will make the right decision.
We agree about what the right actions are for the various reward structures. We can then try to work backwards from what the right action is to what probability Beauty should assign to the coin landing Heads after being wakened, in order that this probability will lead (by standard decision theory) to her taking the action we’ve decided is the correct one.
For your second scenario, Beauty really has to commit to what to do before the experiment, which means this scheme of working backwards from correct decision to probability of Heads after wakening doesn’t seem to work. Guessing either Heads or Tails is equally good, but only if done consistently. Deciding after each wakening without having thought about it beforehand doesn’t work well, since with the two possibilities being equally good, Beauty might choose differently on Monday and Tuesday, with bad results. Now, if the problem is tweaked with slightly different rewards for guessing Heads correctly than Tails correctly, we can avoid the situation of both guesses being equally good. But the coordination problem still seems to confuse the issue of how to work backwards to the appropriate probabilities (for me at least).
I think it ought to be the case that, regardless of the reward structure, if you work backwards from correct action to probabilities, you get that Beauty after wakening should give probability 1⁄3 to Heads. That seems to be what happens for all the reward structures where Beauty can decide what to do each day without having to know what she might do or have done the other day.
Is there some reason why you’re committed to standard (by which you presumably mean, causal—or what?) decision theory, when approaching this question? After all:
As I understand it, UDT (or some similar decision theory) is the now-standard solution for such dilemmas.
Why, though? More importantly, why does it matter?
It seems to me that all that Beauty needs to know, given that that the scenario (one or two awakenings) is chosen by the flip of a fair coin, is that fair coins land heads half the time, and tails the other half. I really don’t see any reason why we should insist on there being some single, “objectively correct”, subjective probability assignment over the outcomes, that has to hold true for all formulations of this thought experiment, and/or all other Sleeping-Beauty-esque scenarios, etc.
In other words:
I am struggling to see why there should be anything more to the matter than this. We all agree what the right actions are and we are all equally quite capable of determining what those right actions are. It seems to me that we’re done.
A big reason why probability (and belief in general) is useful is that it separates our observations of the world from our decisions. Rather than somehow relating every observation to every decision we might sometime need to make, we instead relate observations to our beliefs, and then use our beliefs when deciding on actions. That’s the cognitive architecture that evolution has selected for (excepting some more ancient reflexes), and it seems like a good one.
I don’t really disagree, per se, with this general point, but it seems strange to insist on rejecting an answer we already have, and already know is right, in the service of this broad point. If you want to undertake the project of generalizing and formalizing the cognitive algorithms that led us to the right answer, fine and well, but in no event should that get in the way of clarity w.r.t. the original question.
Again: we know the correct answer (i.e. the correct action for Beauty to take); and we know it differs depending on what reward structure is on offer. The question of whether there is, in some sense, a “right answer” even if there are no rewards at all, seems to me to be even potentially useful or interesting only in the case that said “right answer” does in fact generate all the practical correct answers that we already have. (And then we can ask whether it’s an improvement on whatever algorithm we had used to generate said right answers, etc.)
Well of course. If we know the right action from other reasoning, then the correct probabilities better lead us to the same action. That was my point about working backwards from actions to see what the correct probabilities are. One of the nice features about probabilities in “normal” situations is that the probabilities do not depend on the reward structure. Instead we have a decision theory that takes the reward structure and probabilities as input and produces actions. It would be nice if the same nice property held in SB-type problems, and so far it seems to me that it does.
I don’t think there has ever been much dispute about the right actions for Beauty to take in the SB problem (i.e., everyone agrees about the right bets for Beauty to make, for whatever payoff structure is defined). So if just getting the right answer for the actions was the goal, SB would never have been considered of much interest.