“using the 1⁄3 answer and working back to try to find P(W) yields P(W) = 3⁄2, which is a strong indication that it is not the probability that matters”
No. It’s proof that your solution is wrong.
And I know exactly why your solution is wrong. You came up with P(Monday|W) using a ratio of expected counts, but you relied on an assumption that trials are independent. Here, the coin flips are indpendent but the counts are not. Even though you are using three counts, there is just one degree of freedom. Vladmir Nesov got it right, I think, when he said “(Tuesday, tails) is the same event as (Monday, tails)”
The last update in my sleeping beauty post explains the problem in more detail.
Of course P(W) isn’t bound within [0,1]; W is one of any number of events, in this case 2: P(You will be woken for the first time) = 1; P(You will be woken a second time) = 1⁄2. The fact that natural language and the phrasing of the problem attempts to hide this as “you wake up” is not important. That is why P(W) is apparently broken; it double counts some futures, it is the expected number of wakings. This is why I split into conditioning on waking on Monday or Tuesday.
(Tuesday, tails) is not the same event as (Monday, tails). They are distinct queries to whatever decision algorithm you implement; there are any number of trivial means to distinguish them without altering the experiment (Say “we will keep you in a red room on one day and a blue one on the other, with the order to be determined by a random coin flip)
They are strongly correlated events, granted. If either occurs, so will the other. That does not make them the same event. On your argumentation, you would assert confidently to that the coin is fair beforehand, yet also assert that the conditional probability that you wake on Monday depends on the coin flip, when in either branch you are woken then with probability 1.
I already addressed this elsewhere. The problem is that W is not a boolean, it’s a probability distribution over observer moments, so P(W) and P(~W) are undefined (type errors).
At one point in your post you said “For convenience let us say that the event W is being woken” and then later on you suggest W is something else, but I don’t see where you really defined it.
You’re saying W itself is a probability distribution. What probability distribution? Can you be specific?
P(H) and P(H|W) are probabilities. It’s unclear to me how those can be well defined, but the law of total probability doesn’t apply.
SleepingBeauty(S(I)) =
{
coin = rnd({"H","T"})
S("starting the experiment now")
if(coin=="H"):
S("you just woke up")
S("you just woke up")
else:
S("you just woke up")
S("the experiment's over now")
return 0
}
This notation is from decision theory; S is sleeping beauty’s chosen strategy, a function which takes as arguments all the observations, including memories, which sleeping beauty has access to at that point, and returns the value of any decision SB makes. (In this case, the scenario doesn’t actually do anything with SB’s answers, so the program ignores them.)
An observer-moment is a complete state of the program at a point where S is executed, including the arguments to S. Now, take all the possible observer-moments, weighted by the expected number of times that a given run of SleepingBeauty contains that observer moment. To condition on some information, take the subset of those observer-moments which match that information. So, P(coin=heads|I=”you just woke up”) means, of all the calls to S where I=”you just woke up”, weighted by probability of occurance, what fraction of them are on the heads branch? This is 1⁄3. On the other hand, P(coin=heads|I=”the experiment’s over now”)=1/2.
SleepingBeauty(S(I)) =
{
coin = rnd({"H","T"})
S("starting the experiment now")
if(coin=="H"):
S("you just woke up")
S("you just woke up")
else:
S("you just woke up")
S("the experiment's over now")
return 0
}
This notation is from decision theory; S is sleeping beauty’s chosen strategy, a function which takes as arguments all the observations, including memories, which sleeping beauty has access to at that point.
An observer-moment is a complete state of the program at a point where S is executed, including the arguments to S. Now, take all the possible observer-moments, weighted by the probability that a given run of SleepingBeauty contains that observer moment. To condition on some information, take the subset of those observer-moments which match that information. So, P(coin=heads|I=”you just woke up”) means, of all the calls to S where I=”you just woke up”, weighted by probability of occurance, what fraction of them are on the heads branch? This is 1⁄3. On the other hand, P(coin=heads|I=”the experiment’s over now”)=1/2.
Of course! (?) You derived P(W) using probability laws, i.e., solving for it in this equation: P(H)=P(H|W)P(W), where P(H)=1/2 and P(H|W)=1/3. These are probabilities. And your 1⁄3 solution proves there is an error.
If two variables have correlation of 1, I think you could argue that they are the same (they contain the same quantitative information, at least).
On your argumentation, you would assert confidently to that the coin is fair beforehand, yet also assert that the conditional probability that you wake on Monday depends on the coin flip, when in either branch you are woken then with probability 1.
No. You will wake on Monday with probability one. But, on a randomly selected awakening, it is more likely that it’s Monday&Heads than Monday&Tails, because you are on the Heads path on 50% of experiments
What is this random selection procedure you use in the last para?
(“I select an awakening, but I can’t tell which” is the same statement as “Each awakening has probability 1/3″ and describes SB’s epistemic situation.)
Random doesn’t necessarily mean uniform. When Beauty wakes up, she knows she is somewhere on the tails path with probability .5, and somewhere on the tails path with probability .5. If tails, she also knows it’s either monday or tuesday, and from her persepctive, she should treat those days as equally likely (since she has no way of distinguishing). Thus, the distribution from which we would select an awakening at random has probabilities 0.5, 0.25 and 0.25.
No, because my probability tree was meant to reflect how beauty should view the probabilities at the time of an awakening. From that perspective, your tree would be incorrect (as two awakenings cannot happen at one time)
After the 1000 experiments, you divided 500 by 2 - getting 250. You should have multiplied 500 by 2 - getting 1000 tails observations in total. It seems like a simple-enough math mistake.
No, that’s not what I did. I’ll assume that you are smart enough to understand what I did, and I just did a poor job of explaining it. So I don’t know if it’s worth trying again. But basically, my probability tree was meant to reflect how Beauty should view the state of the world on an awakening. It was not meant to reflect how data would be generated if we saw the experiment through to the end. I thought it would be useful. But you can scrap that whole thing and my other arguments hold.
Well you did divide 500 by 2 - getting 250. And you should have multiplied the 500 tails events by 2 (the number of interviews that were conducted after each “tails” event) - getting 1000 “tails” interviews in total. 250 has nothing to do with this problem.
No, P(H)=P(H|W)P(W) is incorrect because the W in P(H|W) is different than the W in P(W): the former is a probability distribution over a set of three events, while the latter is a boolean. Using the former definition, as a probability distribution, P(W) isn’t meaningful at all, it’s just a type error.
It isn’t a probability; the only use of it was to note the method leading to a 1⁄2 solution and where I consider it to fail, specifically because the number of times you are woken is not bound in [0,1] and thus “P(W)” as used in the 1⁄2 conditioning is malformed, as it doesn’t keep track of when you’re actually woken up. In as much as it is anything, using the 1⁄2 argumentation, “P(W)” is the expected number of wakings.
No. You will wake on Monday with probability one. But, on a randomly selected awakening, it is more likely that it’s Monday&Heads than Monday&Tails, because you are on the Heads path on 50% of experiments
Sorry, but if we’re randomly selecting a waking then it is not true that you’re on the heads path 50% of the time. In a pair of runs, one head, one tail, you are woken 3 times, twice on the tails path.
On a randomly selected run of the experiment, there is a 1⁄2 chance of being in either branch, but:
Choose a uniformly random waking in a uniformly chosen random run
is not the same as
Choose a uniformly random waking.
Why are you using the notation P(W) when you mean E(W)? And if you can get an expectation for it, you must know the probability of it.
Sorry, but if we’re randomly selecting a waking then it is not true that you’re on the heads path 50% of the time. In a pair of runs, one head, one tail, you are woken 3 times, twice on the tails path.
Randomly selecting a waking does not imply a uniform distribution. On the contrary, we know the distribution is not uniform.
Wait, I didn’t catch this the first time:
“using the 1⁄3 answer and working back to try to find P(W) yields P(W) = 3⁄2, which is a strong indication that it is not the probability that matters”
No. It’s proof that your solution is wrong.
And I know exactly why your solution is wrong. You came up with P(Monday|W) using a ratio of expected counts, but you relied on an assumption that trials are independent. Here, the coin flips are indpendent but the counts are not. Even though you are using three counts, there is just one degree of freedom. Vladmir Nesov got it right, I think, when he said “(Tuesday, tails) is the same event as (Monday, tails)”
The last update in my sleeping beauty post explains the problem in more detail.
Of course P(W) isn’t bound within [0,1]; W is one of any number of events, in this case 2: P(You will be woken for the first time) = 1; P(You will be woken a second time) = 1⁄2. The fact that natural language and the phrasing of the problem attempts to hide this as “you wake up” is not important. That is why P(W) is apparently broken; it double counts some futures, it is the expected number of wakings. This is why I split into conditioning on waking on Monday or Tuesday.
(Tuesday, tails) is not the same event as (Monday, tails). They are distinct queries to whatever decision algorithm you implement; there are any number of trivial means to distinguish them without altering the experiment (Say “we will keep you in a red room on one day and a blue one on the other, with the order to be determined by a random coin flip)
They are strongly correlated events, granted. If either occurs, so will the other. That does not make them the same event. On your argumentation, you would assert confidently to that the coin is fair beforehand, yet also assert that the conditional probability that you wake on Monday depends on the coin flip, when in either branch you are woken then with probability 1.
If P(H) and P(H|W) are probabilities, then it must be true that:
P(H)=P(H|W)P(W)+P(H|~W)P(~W), where ~W means not W (any other event), by the law of total probability
If P(H)=1/2 and P(H|W)=1/3, as you claim, then we have
1/2=1/3P(W)+P(H|~W)(1-P(W))
P(H|~W) should be 0, since we know she will be awakened if heads. But that leads to P(W)=3/2.
P(W) should be 1, but that leads to an equation 1/2=1/3
So, this is a big mess.
The reason it is a big mess is because the 1⁄3 solution was derived by treating one random variable as two.
I already addressed this elsewhere. The problem is that W is not a boolean, it’s a probability distribution over observer moments, so P(W) and P(~W) are undefined (type errors).
At one point in your post you said “For convenience let us say that the event W is being woken” and then later on you suggest W is something else, but I don’t see where you really defined it.
You’re saying W itself is a probability distribution. What probability distribution? Can you be specific?
P(H) and P(H|W) are probabilities. It’s unclear to me how those can be well defined, but the law of total probability doesn’t apply.
Suppose we write out SB as a world-program:
This notation is from decision theory; S is sleeping beauty’s chosen strategy, a function which takes as arguments all the observations, including memories, which sleeping beauty has access to at that point, and returns the value of any decision SB makes. (In this case, the scenario doesn’t actually do anything with SB’s answers, so the program ignores them.)
An observer-moment is a complete state of the program at a point where S is executed, including the arguments to S. Now, take all the possible observer-moments, weighted by the expected number of times that a given run of SleepingBeauty contains that observer moment. To condition on some information, take the subset of those observer-moments which match that information. So, P(coin=heads|I=”you just woke up”) means, of all the calls to S where I=”you just woke up”, weighted by probability of occurance, what fraction of them are on the heads branch? This is 1⁄3. On the other hand, P(coin=heads|I=”the experiment’s over now”)=1/2.
Suppose we write out SB as a world-program:
This notation is from decision theory; S is sleeping beauty’s chosen strategy, a function which takes as arguments all the observations, including memories, which sleeping beauty has access to at that point.
An observer-moment is a complete state of the program at a point where S is executed, including the arguments to S. Now, take all the possible observer-moments, weighted by the probability that a given run of SleepingBeauty contains that observer moment. To condition on some information, take the subset of those observer-moments which match that information. So, P(coin=heads|I=”you just woke up”) means, of all the calls to S where I=”you just woke up”, weighted by probability of occurance, what fraction of them are on the heads branch? This is 1⁄3. On the other hand, P(coin=heads|I=”the experiment’s over now”)=1/2.
“Of course P(W) isn’t bound within [0,1]”
Of course! (?) You derived P(W) using probability laws, i.e., solving for it in this equation: P(H)=P(H|W)P(W), where P(H)=1/2 and P(H|W)=1/3. These are probabilities. And your 1⁄3 solution proves there is an error.
If two variables have correlation of 1, I think you could argue that they are the same (they contain the same quantitative information, at least).
No. You will wake on Monday with probability one. But, on a randomly selected awakening, it is more likely that it’s Monday&Heads than Monday&Tails, because you are on the Heads path on 50% of experiments
What is this random selection procedure you use in the last para?
(“I select an awakening, but I can’t tell which” is the same statement as “Each awakening has probability 1/3″ and describes SB’s epistemic situation.)
Random doesn’t necessarily mean uniform. When Beauty wakes up, she knows she is somewhere on the tails path with probability .5, and somewhere on the tails path with probability .5. If tails, she also knows it’s either monday or tuesday, and from her persepctive, she should treat those days as equally likely (since she has no way of distinguishing). Thus, the distribution from which we would select an awakening at random has probabilities 0.5, 0.25 and 0.25.
This appears to be where you are getting confused. Your probability tree in your post was incorrect. It should look like this:
If you think about writing a program to simulate the experiment this should be obvious.
No, because my probability tree was meant to reflect how beauty should view the probabilities at the time of an awakening. From that perspective, your tree would be incorrect (as two awakenings cannot happen at one time)
After the 1000 experiments, you divided 500 by 2 - getting 250. You should have multiplied 500 by 2 - getting 1000 tails observations in total. It seems like a simple-enough math mistake.
No, that’s not what I did. I’ll assume that you are smart enough to understand what I did, and I just did a poor job of explaining it. So I don’t know if it’s worth trying again. But basically, my probability tree was meant to reflect how Beauty should view the state of the world on an awakening. It was not meant to reflect how data would be generated if we saw the experiment through to the end. I thought it would be useful. But you can scrap that whole thing and my other arguments hold.
Well you did divide 500 by 2 - getting 250. And you should have multiplied the 500 tails events by 2 (the number of interviews that were conducted after each “tails” event) - getting 1000 “tails” interviews in total. 250 has nothing to do with this problem.
No, P(H)=P(H|W)P(W) is incorrect because the W in P(H|W) is different than the W in P(W): the former is a probability distribution over a set of three events, while the latter is a boolean. Using the former definition, as a probability distribution, P(W) isn’t meaningful at all, it’s just a type error.
It isn’t a probability; the only use of it was to note the method leading to a 1⁄2 solution and where I consider it to fail, specifically because the number of times you are woken is not bound in [0,1] and thus “P(W)” as used in the 1⁄2 conditioning is malformed, as it doesn’t keep track of when you’re actually woken up. In as much as it is anything, using the 1⁄2 argumentation, “P(W)” is the expected number of wakings.
Sorry, but if we’re randomly selecting a waking then it is not true that you’re on the heads path 50% of the time. In a pair of runs, one head, one tail, you are woken 3 times, twice on the tails path.
On a randomly selected run of the experiment, there is a 1⁄2 chance of being in either branch, but: Choose a uniformly random waking in a uniformly chosen random run is not the same as Choose a uniformly random waking.
Why are you using the notation P(W) when you mean E(W)? And if you can get an expectation for it, you must know the probability of it.
Randomly selecting a waking does not imply a uniform distribution. On the contrary, we know the distribution is not uniform.