Please elaborate.
neq1
Perhaps a better title would be “Bayes’ Theorem Illustrated (My Ways)”
In the first example you use shapes with colors of various sizes to illustrate the ideas visually. In the second example, you using plain rectangles of approximately the same size. If I was a visual learner, I don’t know if your post would help me much.
I think you’re on the right track in example one. You might want to use shapes that are easier to estimate the relative areas. It’s hard to tell if one triangle is twice as big as another (as measured by area), but it’s easier to do with rectangles of the same height (where you just vary the width). More importantly, I think it would help to show math with shapes. For example, I would suggest that figure 18 has P(door 2)= the orange triangle in figure 17 divided by the orange triangle plus the blue triangle from figure 17 (but where you show the division by shapes). When I teach, I sometimes do this with Venn diagrams (show division of chunks of circles and rectangles to illustrate conditional probability).
It seems to me that the standard solutions don’t account for the fact that there are a non-trivial number of families who are more likely to have a 3rd child, if the first two children are of the same sex. Some people have a sex-dependent stopping rule.
P(first two children different sexes | you have exactly two children) > P(first two children different sexes | you have more than two children)
The other issue with this kind of problem is the ambiguity. What was the disclosure algorithm? How did you decide which child to give me information about? Without that knowledge, we are left to speculate.
We should blame and stigmatize people for conditions where blame and stigma are the most useful methods for curing or preventing the condition, and we should allow patients to seek treatment whenever it is available and effective.
I think you said it better earlier when you talked about whether the reduction in incidence outweighs the pain caused by the tactic. For some conditions, if it wasn’t for the stigma there would be little-to-nothing unpleasant about it (and we wouldn’t need to talk about reducing incidence).
I agree with your general principle, but think it’s unlikely that blame and stigma are ever the most useful methods. We should be careful to avoid the false dichotomy between the “stop eating like a pig” tactic and fat acceptance.
Sandy’s husband is an asshole, who probably defends his asshole behavior by rationalizing that he’s trying to help her. He’s not really trying to help her (or if he is, he knows little about psychology (or women)).
Blame and judgment are such strong signaling devices that I think people rarely use it for the benefit of the one being judged. If it happens to be the best tactic for dealing with the problem, well, that would be a quite a coincidence.
--
I liked your post a lot, in case that wasn’t clear. I think you are focusing on the right kinds of questions.
Sorry I was slow to respond .. busy with other things
My answers:
Q1: I agree with you: 1⁄3, 1⁄3, 2⁄3
Q2. ISB is similar to SSB as follows: fair coin; woken up twice if tails, once if heads; epistemic state reset each day
Q3. ISB is different from SSB as follows: more than one coin toss; same number of interviews regardless of result of coin toss
Q4. It makes a big difference. She has different information to condition on. On a given coin flip, the probability of heads is 1⁄2. But, if it is tails we skip a day before flipping again. Once she has been woken up a large number of times, Beauty can easily calculate how likely it is that heads was the most recent result of a coin flip. In SSB, she cannot use the same reasoning. In SSB, Tuesday&heads doesn’t exist, for example.
Consider 3 variations of SSB:
Same as SSB except If heads, she is interviewed on Monday, and then the coin is turned over to tails and she is interviewed on Tuesday. There is amnesia and all of that. So, it’s either the sequence (heads on Monday, tails on Tuesday) or (tails on Monday, tails on Tuesday). Each sequence has a 50% probability, and she should think of the days within a sequence as being equally likely. She’s asked about the current state of the coin. She should answer P(H)=1/4.
Same as SSB except If heads, she is interviewed on Monday, and then the coin is flipped again and she is interviewed on Tuesday. There is amnesia and all of that. So, it’s either the sequence (heads on Monday, tails on Tuesday), (heads on Monday, heads on Tuesday) or (tails on Monday, tails on Tuesday). The first 2 sequences have a 25% chance each and the last one has a 50% chance. When asked about the current state of the coin, she should say P(H)=3/8
The 1⁄2 solution to SSB results from similar reasoning. 50% chance for the sequence (Monday and heads). 50% chance for the sequence (Monday and tails, Tuesday and tails). P(H)=1/2
If you apply this kind of reasoning to ISB, where we are thinking of randomly selected day after a lot of time has passed, you’ll get P(H)=1/3.
I’m struggling to see how ISB isn’t different from SSB in meaningful ways.
My NT ‘data’ are from conversations I’ve had over the years with people who I have noticed are particularly good socially. But of course, there is plenty of between person variability even within NT and AS groups.
The thing that I have been most surprised by is how much NTs like symbols and gestures.
Here are some examples:
Suppose you think your significant other should have a cake on his/her birthday. You are not good at baking. Aspie logic: “It’s better to buy a cake from a bakery than to make it myself, since the better the cake tastes the happier they’ll be.” Of course, the correct answer is that the effort you put into it is what matters (to an NT).
Suppose you are walking through a doorway and you are aware that there is someone about 20 feet behind you. Aspie logic: “If I hold the door for them they will feel obligated to speed up a little, so that I’m not waiting too long. That will just inconvenience them. Plus, it’s not hard to open a door. Thus, it’s better for them if I let the door close.” To the NT, you are just inconsiderate.
Suppose you are sending out invitations to a graduation party. You know that one of your close friends is going to be out of town that weekend. Aspie logic: “There is no reason to send them an invitation, since I already know they can’t go. In fact, sending them an invitation might make them feel bad.” If your friend is an NT, it’s the wrong answer. They want to know they are wanted. Plus, it’s always possible their travel plans will get canceled.
In each of these 3 examples the person with AS is actually being considerate, but would not appear that way to an NT.
Yes, I’ve read that paper, and disagree with much of it. Perhaps I’ll take the time to explain my reasoning sometime soon
Anthropic reasoning is what leads people to believe in miracles. Rare events have a high probability of occurring if the number of observations is large enough. But whoever that rare event happens to will feel like it couldn’t have just happened by chance, because the odds of it happening to them was so large.
If you wait until the event occurs, and then start treating it as a random event from a single trial, forming your hypothesis after seeing the data, you’ll make inferential errors.
Imagine that there are balls in an urn, labeled with numbers 1, 2,...,n. Suppose we don’t know n. A ball is selected. We look at it. We see that it’s number x.
non-anthropic reasoning: all numbers between 1 and n were equally likely. I was guaranteed to observe some number, and the probability that it was close to n was the same as the probability that it was far from n. So all I know is that n is greater than or equal to x.
anthropic reasoning: A number as small as x is much less likely if n is large. Therefore, hypotheses with n close to x are more likely than hypotheses where n is much larger than x.
This is interesting. We shouldn’t get a discontinuous jump.
Consider 2 related situations:
if Heads she is woken up on Monday, and the experiment ends on Tuesday. If tails, she is woken up on Monday and Tuesday, and the experiment ends on Wed. In this case, there is no ‘not awake’ option.
If heads she is woken up on Monday and Tuesday. On Monday she is asked her credence for heads. On Tuesday she is told “it’s Tuesday and heads” (but she is not asked about her credence; that is, she is not interviewed). If tails, it’s the usual woken up both days and asked about her credence. The experiment ends on Wed.
In both of these scenarios, 50% of coin flips will end up heads. In both cases, if she’s interviewed she knows it’s either Monday&heads, Monday&tails or Tuesday&tails. She has no way of telling these three options apart, due to the amnesia.
I don’t think we should be getting different answers in these 2 situations. Yet, I think if we use your probability distributions we do.
I think there are two basic problems. One is that Monday&tails is really not different from Tuesday&tails. They are the same variable. It’s the same experience. If she could time travel and repeat the monday waking it would feel the same to her as the Tuesday waking. The other issue is that, even though in my scenario 2 above, when she is woken but before she knows if she will be interviewed, it would look like there is a 25% chance it’s heads&Monday and a 25% it’s heads&Tuesday. And that’s probably a reasonable way to look at it. But, that doesn’t imply that, once she finds out it’s an interview day, that the probability of heads&Monday shifts to 1⁄3. That’s because on 50% of coin flips she will experience heads&Monday. That’s what makes this different than a usual joint probability table representing independent events.
At this point, it is just assertion that it’s not a probability. I have reasons for believing it’s not one, at least, not the probability that people think it is. I’ve explained some of that reasoning.
I think it’s reasonable to look at a large sample ratio of counts (or ratio of expected counts). The best way to do that, in my opinion, is with independent replications of awakenings (that reflect all possibilities at an awakening). I probably haven’t worded this well, but consider the following two approaches. For simplicity, let’s say we wanted to do this (I’m being vague here) 1000 times.
Replicate the entire experiment 1000 times. That is, there will be 1000 independent tosses of the coin. This will lead between 1000 and 2000 awakenings, with expected value of 1500 awakenings. But… whatever the total number of awakenings are, they are not independent. For example, one the first awakening it could be either heads or tails. On the second awakening, it only could be heads if it was heads on the first awakening. So, Beauty’s options on awakening #2 are (possibly) different than her options on awakening #1. We do not have 2 replicates of the same situation. This approach will give you the correct ratio of counts in the long run (for example, we do expect the # of heads & Monday to equal the # of tails and Monday and the # of tails and Tuesday).
Replicate her awakening-state 1000 times. Because her epistemic state is always the same on an awakening, from her perspective, it could be Monday or Tuesday, it could be heads or tails. She knows that it was a fair coin. She knows that if she’s awake it’s definitely Monday if heads, and could be either Monday or Tuesday if tails. She knows that 50% of coin tosses would end up heads, so we assign 0.5 to Monday&heads. She knows that 50% of coin tosses would end up tails, so we assign 0.5 to tails, which implies 0.25 to tails&Monday and 0.25 to tails&Tuesday. If we generate observations from this 1000 times, we’ll get 1000 awakenings. We’ll end up with heads 50% of the time.
The distinction between 1 and 2 is that, in 2, we are trying to repeatedly sample from the joint probability distributions that she should have on an awakening. In 1, we are replicating the entire experiment, with the double counting on tails.
In 1, people are using these ratios of expected counts to get the 1⁄3 answer. 1⁄3 is the correct answer to the question about the long-run frequencies of awakenings preceded by heads to awakenings preceded by tails. But I do not think it is the answer to the question about her credence of heads on an awakening.
In 2, the joint probabilities are determined ahead of time based on what we know about the experiment.
Let n2 and n3 are counts, in repeated trials, of tails&Monday and tails&Tuesday, respectively. You will of course see that n2=n3. They are the same random variable. tails&Monday and tails&Tuesday are the same. It’s like what Jack said about types and tokens. It’s like Vladimir_Nesov said:
Two subsequent states of a given dynamical system make for poor distinct elements of a sample space: when we’ve observed that the first moment of a given dynamical trajectory is not the second, what are we going to do when we encounter the second one? It’s already ruled “impossible”! Thus, Monday and Tuesday under the same circumstances shouldn’t be modeled as two different elements of a sample space.
You said:
I don’t mean to claim that as soon as Beauty awakes, new evidence comes to light that she can add to her store of bits in additive fashion, and thereby update her credence from 1⁄2 to 1⁄3 along the way. If this is the only kind of evidence that your theory of Bayesian updating will acknowledge, then it is too restrictive.
I don’t think it matters if she has the knowledge before the experiment or not. What matters is if she has new information about the likelihood of heads to update on. If she did, we would expect her accuracy to improve. So, for example, if she starts out believing that heads has probability 1⁄2, but learns something about the coin toss, her probability might go up a little if heads and down a little if tails. Suppose, for example, she is informed of a variable X. If P(heads|X)=P(tails|X), then why is she updating at all? Meaning, why is P(heads)=/=P(heads|X)? This would be unusual. It seems to me that the only reason she changes is because she knows she’d be essentially ‘betting’ twice of tails, but that really is distinct from credence for tails.
The probability represents how she should see things when she wakes up.
She knows she’s awake. She knows heads had probability 0.5. She knows that, if it landed heads, it’s Monday with probability 1. She knows that, if it landed tails, it’s either Monday or Tuesday. Since there is no way for her to distinguish between the two, she views them as equally likely. Thus, if tails, it’s Monday with probability 0.5 and Tuesday with probability 0.5.
I like the version of your halfer variant version of your table. I still need to think about your distributions more though. I’m not sure it makes sense to have a variable ‘woken that day’ for this problem.
Makes sense to me.
Ok, yes, sometimes relative frequencies with duplicates can be probabilities, I agree.
If she does condition on being woken up, I think she still gets 1⁄2. I hate to keep repeating arguments, but what she knows when she is woken up is that she has been woken up at least once. If you just apply Bayes rule, you get 1⁄2.
If conditioning causes her to change her probability, it should do so in such a way that makes her more accurate. But as we see in the cancer problem, people with cancer give the same answer as people without.
Then you’d agree that with full memory, the patient would have something to update on?
Yes, but then we wouldn’t be talking about her credence on an awakening. We’d be talking about her credence on first waking and second waking. We’d treat them separately. With amnesia, 2 wakings are the same as 1. It’s really just one experience.
That’s a good example. There is a big difference though (it’s subtle). With sleeping beauty, the question is about her probability at a waking. At a waking, there are no duplicate surveys. The duplicates occur at the end.
Morendil,
This is strange. It sounds like you have been making progress towards settling on an answer, after discussion with others. That would suggest to me that discussion can move us towards consensus.
I like your approach a lot. It’s the first time I’ve seen the thirder argument defended with actually probability statements. Personally, I think there shouldn’t be any probability mass on ‘not woken’, but that is something worth thinking about and discussing.
One thing that I think is odd. Thirders know she has nothing to update on when she is woken, because they admit she will give the same answer, regardless of if it’s heads or tails. If she really had new information that is correlated with the outcome, her credence would move towards heads when heads, and tails when tails.
Consider my cancer intuition pump example. Everyone starts out thinking there is a 50% chance they have cancer. Once woken, regarldess of if they have cancer or not, they all shift to 90%. Did they really learn anything about their disease state by being woken? If they did, those with cancer would have shifted their credence up a bit, and those without would have shifted down. That’s what updating is.
I tried to explain it here: http://lesswrong.com/lw/28u/conditioning_on_observers/1zy8
Basically, the 2 wakings on tails should be thought of as one waking. You’re just counting the same thing twice. When you include counts of variables that have a correlation of 1 in your denominator, it’s not clear what you are getting back. The thirders are using a relative frequency that doesn’t converge to a probability
- May 14, 2010, 5:15 PM; 0 points) 's comment on Updating, part 1: When can you change your mind? The binary model by (
But: “You can be a virtue ethicist whose virtue is to do the consequentialist thing to do”