I think the question we now have to ask to resolve the remaining confusion is—what, exactly, is it that Beauty is uncertain about, and at what time?
“At what time” doesn’t matter in this formalism. You can be uncertain about future events or about past events, all that matters is how you update your uncertainty upon receiving new information.
So a triplet (x,y,z) represents, in the abstract, a conceivable configuration of the component uncertainties in the experimental setup. The coin could have come up heads or tails; it could be Monday or Tuesday; Beauty can be woken up on that day, or left asleep.
The joint probability P(x,y,z) is the plausibility we assign—in a timeless manner—to the corresponding propositions. Strictly speaking, it should be P(x,y,z|B) where B is our background information about the experiment: the rules, the fact that the coin is unbiased (or not known to be biased), and so on.
Our background information directs how we allocate probability mass to the various points in the sample space: P(T,T,S) corresponds to “the coin comes up tails, the day is Tuesday, Beauty is asleep”. The rules of the experiment require that this be zero.
On the other hand, P(H,T,S) corresponds to “the coin comes up heads, the day is Tuesday, Beauty is asleep”, and this can be non-zero.
When you learn (“condition on”) some new information, the probability distribution is altered: you only keep the points which correspond to this particular variable having the value(s) you learned, and you renormalize so that the total probability is 1. So, on learning “heads” you keep only the points having x=H. On learning what day it is you keep only the points having that value for y.
When Beauty wakes up, she learns the value of z, so she can condition on z. That means she throws away the part of the joint distribution where she was supposed to be asleep. If that part of the joint distribution did contain some probability mass (as I’ve argued above it can), then that can make P(x|z=W) something other than 1⁄2.
“At what time” doesn’t matter in this formalism. You can be uncertain about future events or about past events, all that matters is how you update your uncertainty upon receiving new information.
So a triplet (x,y,z) represents, in the abstract, a conceivable configuration of the component uncertainties in the experimental setup. The coin could have come up heads or tails; it could be Monday or Tuesday; Beauty can be woken up on that day, or left asleep.
The joint probability P(x,y,z) is the plausibility we assign—in a timeless manner—to the corresponding propositions. Strictly speaking, it should be P(x,y,z|B) where B is our background information about the experiment: the rules, the fact that the coin is unbiased (or not known to be biased), and so on.
Our background information directs how we allocate probability mass to the various points in the sample space: P(T,T,S) corresponds to “the coin comes up tails, the day is Tuesday, Beauty is asleep”. The rules of the experiment require that this be zero.
On the other hand, P(H,T,S) corresponds to “the coin comes up heads, the day is Tuesday, Beauty is asleep”, and this can be non-zero.
When you learn (“condition on”) some new information, the probability distribution is altered: you only keep the points which correspond to this particular variable having the value(s) you learned, and you renormalize so that the total probability is 1. So, on learning “heads” you keep only the points having x=H. On learning what day it is you keep only the points having that value for y.
When Beauty wakes up, she learns the value of z, so she can condition on z. That means she throws away the part of the joint distribution where she was supposed to be asleep. If that part of the joint distribution did contain some probability mass (as I’ve argued above it can), then that can make P(x|z=W) something other than 1⁄2.