I have read and participated in many of these debates, and it continually frustrates me that people use the word “probability” AS IF it were objective and a property of the territory, when your bayesean tenet, “Probability is a property of the map (agent’s beliefs), not the territory (environment)” is binding in every case I can think of. I’m actually agnostic on whether some aspects of the universe are truly unknowable by any agent in the universe, and even more so on whether that means “randomness is inherent” or “randomness is a modeling tool”. Yes, this means I’m agnostic on MWI vs Copenhagen, as I can’t define “true” on that level (though I generally use MWI for reasoning, as I find it easier. That framing helps me remember that it’s a modelling choice, not a fact about the universe(s).
In practice, probability is a modeling and prediction tool, and works pretty much the same for all kinds of uncertainty: contingent (which logically-allowed way does this universe behave), indexical (which set of possible experiences in this universe am I having) and logical (things that must be so but I don’t know which way). There are probably edge cases where the difference between these matter, but I don’t know of any that I expect to be resolved by foreseeable humans or our creations.
My pretty strong belief is that 1⁄2 is easier to explain and work with—the coin is fair and Beauty has no new information. And that 1⁄3 is justified if you are predicting “weight” of experience, and the fact that tails will be experienced twice as often. But mostly I’m rather sure that anyone who believes that their preference is the right model is in the wrong (on that part of the question).
They’re “doing epistemology wrong” no more than you. Thinking either choice is best is justified. Thinking the other choice is wrong is itself wrong.
So how do you actually use probability to make decisions? There’s a well-established decision theory that takes probabilities as inputs, and produces a decision in some situation (eg, a bet). It will (often) produce different decisions when given 1⁄2 versus 1⁄3 as the probability of Heads. Which of these two decisions should you act on?
But the whole point of using probability to express uncertainty about the world is that the probabilities do not depend on the purpose.
If there are N possible observations, and M binary choices that you need to make, then a direct strategy for how to respond to an observation requires a table of size NxM, giving the actions to take for each possible observation. And you somehow have to learn this table.
In contrast, if the M choices all depend on one binary state of the world, you just need to have a table of probabilities of that state for each of the N observations, and a table of the utilities for the four action/state combinations for the M decisions—which have size proportional to N+M, much smaller than NxM for large N and M. You only need to learn the N probabilities (perhaps the utilities are givens).
And in reality, trying to make decisions without probabilities is even worse than it seems from this, since the set of decisions you may need to make is indefinitely large, and the number of possible observations is enormous. But avoiding having to make decisions by a direct observation->action table requires that probabilities have meaning independent of what decision you’re considering at the moment. You can’t just say that it could be 1⁄2, or could be 1⁄3...
I think this is a restatement of the crux. OF COURSE the model chosen depends on the purpose of the model. For probabilities, the choice of reference class for a given prediction/measurement is key. For Sleeping Beauty specifically, the choice of whether an experientially-irrelevant wakening (which is immediately erased and has no impact) is distinct from another is a modeling choice.
Either choice for probability modeling can answer either wagering question, simply by applying the weights to the payoffs if it’s not already part of the probability
Sure. By tweaking your “weights” or other fudge factors, you can get the right answer using any probability you please. But you’re not using a generally-applicable method, that actually tells you what the right answer is. So it’s a pointless exercise that sheds no light on how to correctly use probability in real problems.
To see that the probability of Heads is not “either 1⁄2 or 1⁄3, depending on what reference class you choose, or how you happen to feel about the problem today”, but is instead definitely, no doubt about it, 1⁄3, consider the following possibility:
Upon wakening, Beauty see that there is a plate of fresh muffins beside her bed. She recognizes them as coming from a nearby cafe. She knows that they are quite delicious. She also knows that, unfortunately, the person who makes them on Mondays puts in an ingredient that she is allergic to, which causes a bad tummy ache. Muffins made on Tuesday taste the same, but don’t cause a tummy ache. She needs to decide whether to eat a muffin, weighing the pleasure of their taste against the possibility of a subsequent tummy ache.
If Beauty thinks the probability of Heads is 1⁄2, she presumably thinks the probability that it is Monday is (1/2)+(1/2)*(1/2)=3/4, whereas if she thinks the probability of Heads is 1⁄3, she will think the probability that it is Monday is (1/3)+(1/2)*(2/3)=2/3. Since 3⁄4 is not equal to 2⁄3, she may come to a different decision about whether to eat a muffin if she thinks the probability of Heads is 1⁄2 than if she thinks it is 1⁄3 (depending on how she weighs the pleasure versus the pain). Her decision should not depend on some arbitrary “reference class”, or on what bets she happens to be deciding whether to make at the same time. She needs a real probability. And on various grounds, that probability is 1⁄3.
Sure. By tweaking your “weights” or other fudge factors, you can get the right answer using any probability you please. But you’re not using a generally-applicable method, that actually tells you what the right answer is. So it’s a pointless exercise that sheds no light on how to correctly use probability in real problems.
Completely agree. The general applicable method is:
Understand what probability experiment is going on, based on the description of the problem.
Construct the sample space from mutually exclusive outcomes of this experiment
Construct the event space based on the sample space, such that it was minimal and sufficient to capture all the events that the participant of the experiment can observe
Define probability as a measure function over the event space, such that:
The sum of probabilities of events consisting of only individual mutually exclusive and collectively exshaustive outcomes was equal to 1 and
if an event has probability 1/a then this event happens on average N/a times on a repetition of probability experiment N times for any large N.
Naturally, this produce answer 1⁄2 for the Sleeping Beauty problem.
If Beauty thinks the probability of Heads is 1⁄2, she presumably thinks the probability that it is Monday is (1/2)+(1/2)*(1/2)=3/4
This is a description of Lewisian Halfism reasoning, that in incorrect for the Sleeping Beauty problem
I describe the way the Beauty is actually supposed to reason about betting scheme on a particular day here.
She needs a real probability.
Indeed. And real probability domain of function is event space, consisting of properly defined events for the probability experiment. “Today is Monday” is ill-defined in the Sleeping Beauty setting. Therefore it can’t have probability.
[ bowing out after this—I’ll read responses and perhaps update on them, but probably won’t respond (until next time) ]
To see that the probability of Heads is not “either 1⁄2 or 1⁄3, depending on what reference class you choose
I disagree. Very specifically, it’s 1⁄2 if your reference class is “fair coin flips” and 1⁄3 if your reference class is “temporary, to-be-erased experience of victims with adversarial memory problems”.
If your reference class is “wakenings who are predicting what day it is”, as the muffin variety, then 1⁄3 is a bit easier to work with (though you’d need to specify payoffs to explain why she’d EVER eat the muffin, and then 1⁄2 becomes pretty easy too). This is roughly equivalent to the non-memory-wiping wager: I’ll flip a fair coin, you predict heads or tails. If it’s heads, the wager will be $1, if it’s tails, the wager is $2. The probability of tails is not 2⁄3, but you’d pay up to $0.50 to play, right?
OK, I’ll end by just summarizing that my position is that we have probability theory, and we have decision theory, and together they let us decide what to do. They work together. So for the wager you describe above, I get probability 1⁄2 for Heads (since it’s a fair coin), and because of that, I decide to pay anything less than $0.50 to play. If I thought that the probability of heads was 0.4, I would not pay anything over $0.20 to play. You make the right decision if you correctly assign probabilities and then correctly apply decision theory. You might also make the right decision if you do both of these things incorrectly (your mistakes might cancel out), but that’s not a reliable method. And you might also make the right decision by just intuiting what it is. That’s fine if you happen to have good intuition, but since we often don’t, we have probability theory and decision theory to help us out.
One of the big ways probability and decision theory help is by separating the estimation of probabilities from their use to make decisions. We can use the same probabilities for many decisions, and indeed we can think about probabilities before we have any decision to make that they will be useful for. But if you entirely decouple probability from decision-making, then there is no longer any basis for saying that one probability is right and another is wrong—the exercise becomes pointless. The meaningful justification for a probability assignment is that it gives the right answer to all decision problems when decision theory is correctly applied.
As your example illustrates, correct application of decision theory does not always lead to you betting at odds that are naively obtained from probabilities. For the Sleeping Beauty problem, correctly applying decision theory leads to the right decisions in all betting scenarios when Beauty thinks the probability of Heads is 1⁄3, but not when she thinks it is 1⁄2.
[ Note that, as I explain in my top-level answer in this post, Beauty is an actual person. Actual people do not have identical experiences on different days, regardless of whether their memory has been erased. I suspect that the contrary assumption is lurking in the background of your thinking that somehow a “reference class” is of relevance. ]
I have read and participated in many of these debates, and it continually frustrates me that people use the word “probability” AS IF it were objective and a property of the territory, when your bayesean tenet, “Probability is a property of the map (agent’s beliefs), not the territory (environment)” is binding in every case I can think of. I’m actually agnostic on whether some aspects of the universe are truly unknowable by any agent in the universe, and even more so on whether that means “randomness is inherent” or “randomness is a modeling tool”. Yes, this means I’m agnostic on MWI vs Copenhagen, as I can’t define “true” on that level (though I generally use MWI for reasoning, as I find it easier. That framing helps me remember that it’s a modelling choice, not a fact about the universe(s).
In practice, probability is a modeling and prediction tool, and works pretty much the same for all kinds of uncertainty: contingent (which logically-allowed way does this universe behave), indexical (which set of possible experiences in this universe am I having) and logical (things that must be so but I don’t know which way). There are probably edge cases where the difference between these matter, but I don’t know of any that I expect to be resolved by foreseeable humans or our creations.
My pretty strong belief is that 1⁄2 is easier to explain and work with—the coin is fair and Beauty has no new information. And that 1⁄3 is justified if you are predicting “weight” of experience, and the fact that tails will be experienced twice as often. But mostly I’m rather sure that anyone who believes that their preference is the right model is in the wrong (on that part of the question).
They’re “doing epistemology wrong” no more than you. Thinking either choice is best is justified. Thinking the other choice is wrong is itself wrong.
So how do you actually use probability to make decisions? There’s a well-established decision theory that takes probabilities as inputs, and produces a decision in some situation (eg, a bet). It will (often) produce different decisions when given 1⁄2 versus 1⁄3 as the probability of Heads. Which of these two decisions should you act on?
I think about what model fits the needs, roughly multiply payouts by probability estimates, then do whatever feels right in the moment.
I’m not sure that resolves any of these questions, since choice of model for different purposes is the main crux.
But the whole point of using probability to express uncertainty about the world is that the probabilities do not depend on the purpose.
If there are N possible observations, and M binary choices that you need to make, then a direct strategy for how to respond to an observation requires a table of size NxM, giving the actions to take for each possible observation. And you somehow have to learn this table.
In contrast, if the M choices all depend on one binary state of the world, you just need to have a table of probabilities of that state for each of the N observations, and a table of the utilities for the four action/state combinations for the M decisions—which have size proportional to N+M, much smaller than NxM for large N and M. You only need to learn the N probabilities (perhaps the utilities are givens).
And in reality, trying to make decisions without probabilities is even worse than it seems from this, since the set of decisions you may need to make is indefinitely large, and the number of possible observations is enormous. But avoiding having to make decisions by a direct observation->action table requires that probabilities have meaning independent of what decision you’re considering at the moment. You can’t just say that it could be 1⁄2, or could be 1⁄3...
I think this is a restatement of the crux. OF COURSE the model chosen depends on the purpose of the model. For probabilities, the choice of reference class for a given prediction/measurement is key. For Sleeping Beauty specifically, the choice of whether an experientially-irrelevant wakening (which is immediately erased and has no impact) is distinct from another is a modeling choice.
Either choice for probability modeling can answer either wagering question, simply by applying the weights to the payoffs if it’s not already part of the probability
Sure. By tweaking your “weights” or other fudge factors, you can get the right answer using any probability you please. But you’re not using a generally-applicable method, that actually tells you what the right answer is. So it’s a pointless exercise that sheds no light on how to correctly use probability in real problems.
To see that the probability of Heads is not “either 1⁄2 or 1⁄3, depending on what reference class you choose, or how you happen to feel about the problem today”, but is instead definitely, no doubt about it, 1⁄3, consider the following possibility:
If Beauty thinks the probability of Heads is 1⁄2, she presumably thinks the probability that it is Monday is (1/2)+(1/2)*(1/2)=3/4, whereas if she thinks the probability of Heads is 1⁄3, she will think the probability that it is Monday is (1/3)+(1/2)*(2/3)=2/3. Since 3⁄4 is not equal to 2⁄3, she may come to a different decision about whether to eat a muffin if she thinks the probability of Heads is 1⁄2 than if she thinks it is 1⁄3 (depending on how she weighs the pleasure versus the pain). Her decision should not depend on some arbitrary “reference class”, or on what bets she happens to be deciding whether to make at the same time. She needs a real probability. And on various grounds, that probability is 1⁄3.
Completely agree. The general applicable method is:
Understand what probability experiment is going on, based on the description of the problem.
Construct the sample space from mutually exclusive outcomes of this experiment
Construct the event space based on the sample space, such that it was minimal and sufficient to capture all the events that the participant of the experiment can observe
Define probability as a measure function over the event space, such that:
The sum of probabilities of events consisting of only individual mutually exclusive and collectively exshaustive outcomes was equal to 1 and
if an event has probability 1/a then this event happens on average N/a times on a repetition of probability experiment N times for any large N.
Naturally, this produce answer 1⁄2 for the Sleeping Beauty problem.
This is a description of Lewisian Halfism reasoning, that in incorrect for the Sleeping Beauty problem
I describe the way the Beauty is actually supposed to reason about betting scheme on a particular day here.
Indeed. And real probability domain of function is event space, consisting of properly defined events for the probability experiment. “Today is Monday” is ill-defined in the Sleeping Beauty setting. Therefore it can’t have probability.
[ bowing out after this—I’ll read responses and perhaps update on them, but probably won’t respond (until next time) ]
I disagree. Very specifically, it’s 1⁄2 if your reference class is “fair coin flips” and 1⁄3 if your reference class is “temporary, to-be-erased experience of victims with adversarial memory problems”.
If your reference class is “wakenings who are predicting what day it is”, as the muffin variety, then 1⁄3 is a bit easier to work with (though you’d need to specify payoffs to explain why she’d EVER eat the muffin, and then 1⁄2 becomes pretty easy too). This is roughly equivalent to the non-memory-wiping wager: I’ll flip a fair coin, you predict heads or tails. If it’s heads, the wager will be $1, if it’s tails, the wager is $2. The probability of tails is not 2⁄3, but you’d pay up to $0.50 to play, right?
OK, I’ll end by just summarizing that my position is that we have probability theory, and we have decision theory, and together they let us decide what to do. They work together. So for the wager you describe above, I get probability 1⁄2 for Heads (since it’s a fair coin), and because of that, I decide to pay anything less than $0.50 to play. If I thought that the probability of heads was 0.4, I would not pay anything over $0.20 to play. You make the right decision if you correctly assign probabilities and then correctly apply decision theory. You might also make the right decision if you do both of these things incorrectly (your mistakes might cancel out), but that’s not a reliable method. And you might also make the right decision by just intuiting what it is. That’s fine if you happen to have good intuition, but since we often don’t, we have probability theory and decision theory to help us out.
One of the big ways probability and decision theory help is by separating the estimation of probabilities from their use to make decisions. We can use the same probabilities for many decisions, and indeed we can think about probabilities before we have any decision to make that they will be useful for. But if you entirely decouple probability from decision-making, then there is no longer any basis for saying that one probability is right and another is wrong—the exercise becomes pointless. The meaningful justification for a probability assignment is that it gives the right answer to all decision problems when decision theory is correctly applied.
As your example illustrates, correct application of decision theory does not always lead to you betting at odds that are naively obtained from probabilities. For the Sleeping Beauty problem, correctly applying decision theory leads to the right decisions in all betting scenarios when Beauty thinks the probability of Heads is 1⁄3, but not when she thinks it is 1⁄2.
[ Note that, as I explain in my top-level answer in this post, Beauty is an actual person. Actual people do not have identical experiences on different days, regardless of whether their memory has been erased. I suspect that the contrary assumption is lurking in the background of your thinking that somehow a “reference class” is of relevance. ]