EDIT: As wgd points out below, my answer here is wrong in its particulars (I didn’t take into account all of the information available to Sarah and George in the puzzle as stated). The general principles invoked are sound, though.
George does actually know more. I think you’re getting thrown by the fact that his 50-50 probability distribution seems more equivocal (less concentrated) than Sarah’s 66-33 distribution. But remember that these distributions are defined over a space of four elements (boy-boy, boy-girl, girl-boy and girl-girl), so the actual distributions are 0.5-0.5-0-0 for George and 0.33-0.33-0.33-0 for Sarah. When you see it this way, it becomes a bit more plausible that Sarah’s distribution is actually more “spread out”, more equivocal.
To be more precise, suppose you have been given the task of conveying information about the genders of this man’s children. You decide that you will transmit a 0 to represent a boy and a 1 to represent a girl. If the receiver has absolutely no information about the man’s children, apart from the fact that there are two of them and neither is genderqueer, you will need to send two bits of information—one for the elder child’s gender, and one for the younger one’s—in order to convey full information about the genders. On the other hand, if the receiver already knows the children’s genders, you have to send zero bits of information in order to convey full information. So you can think of the number of bits you need to transmit as a measure of the lack of knowledge of the receiver. The fewer bits you need to send, the more the receiver knows.
Now let’s compare George and Sarah. George already knows the elder child’s gender, so you only need to send one further bit, representing the younger child’s gender, in order to convey full information. Sarah’s case is trickier. She knows that one of the children is a boy, but she doesn’t know which one. If it turns out that both children are boys, then your task is easy: you need to send just one bit of information, a 0, representing the gender of the child she hasn’t seen. Once she gets this bit, Sarah will know the genders of both children. But if the other child is a girl, you can’t just say that. You will also need to tell Sarah whether the order of birth is boy-girl or girl-boy. So besides sending a 1 to represent a girl, you’ll need to send one more bit of information in order to distinguish between boy-girl and girl-boy. This means that the number of bits you will have to send Sarah is either 1 or 2, depending on whether the other child is a boy or a girl. If you did this experiment over and over again, with a bunch of different groups of siblings, the average number of bits you send Sarah will be greater than 1 but less than 2.
So with George you only need 1 bit to convey full information, while with Sarah you need (on average) more than 1 bit. This means Sarah does indeed know less about the situation, and there is no paradox. All of this can be made a lot more rigorous using the concept of Shannon entropy, if you’re interested.
I agree that George definitely does know more information overall, since he can concentrate his probability mass more sharply over the 4 hypotheses being considered, but I’m fairly certain you’re wrong when you say that Sarah’s distribution is 0.33-0.33-0-0.33. I worked out the math (which I hope I did right or I’ll be quite embarassed), and I get 0.25-0.25-0-0.5.
I think your analysis in terms of required message lengths is arguably wrong, because the purpose of the question is to establish the genders of the children and not the order in which they were born. That is, the answer to the question “What gender is the child at home?” can always be communicated in a single bit, and we don’t care whether they were born first or second for the purposes of the puzzle. You have to send >1 bit to Sarah only if she actually cares about the order of their births (And specifically, your “1 or 2 bits, depending” result is made by assuming that we don’t care about the birth order if they’re boys. If we care whether the boy currently out walking is the eldest child regardless of the other child’s gender we have to always send Sarah 2 bits).
Another way to look at that result is that when you simply want to ask “What is the probability of a boy or a girl at home?” you are adding up two disjoint ways-the-world-could-be for each case, and this adding operation obscures the difference between Sarah’s and George’s states of knowledge, leading to them both having the same distribution over that answer.
I agree that George definitely does know more information overall, since he can concentrate his probability mass more sharply over the 4 hypotheses being considered, but I’m fairly certain you’re wrong when you say that Sarah’s distribution is 0.33-0.33-0-0.33. I worked out the math (which I hope I did right or I’ll be quite embarassed), and I get 0.25-0.25-0-0.5.
Good point. I was treating the description of Sarah’s encounter with the man as a proxy for “Sarah knows one of the man’s children is a boy, but not which one.” That seems to be the way it’s usually intended when the problem is presented, but you’re right that in the problem as described, Sarah has an additional relevant piece of information—that the man is out with a boy. I think this is an unintended artifact of the way the problem is presented, though. The people presenting the problem are usually trying to get at something different. The usual intent of the puzzle is captured by “Sarah knows that one of Brian’s two children is a boy, and George knows that his eldest child is a boy. What are the probabilities according to Sarah and George that Brian’s other child is a boy?”.
I think your analysis in terms of required message lengths is arguably wrong, because the purpose of the question is to establish the genders of the children and not the order in which they were born. That is, the answer to the question “What gender is the child at home?” can always be communicated in a single bit, and we don’t care whether they were born first or second for the purposes of the puzzle.
Again, I think this is an unintended artifact of the way the puzzle is stated. The fact that Sarah sees one of the kids and doesn’t see the other one gives her a way of individuating the kids other than their birth order. If we don’t assume she has this method of individuation (as in the restated puzzle above) then the birth order is relevant.
I think we’re in agreement then, although I’ve managed to confuse myself by trying to actually do the Shannon entropy math.
In the event we don’t care about birth orders we have two relevant hypotheses which need to be distinguished between (boy-girl at 66% and boy-boy at 33%), so the message length would only need to be 0.9 bits#Definition) if I’m applying the math correctly for the entropy of a discrete random variable. So in one somewhat odd sense Sarah would actually know more about the gender than George does.
Which, given that the original post said
Still, it seems like Sarah knows more about the situation, where George, by being given more information, knows less. His estimate is as good as knowing nothing other than the fact that the man has a child which could be equally likely to be a boy or a girl.
Pragmatist is correct, I did not realize that the way I stated the problem was different than the original.
I full understand the solution to this problem.
However, lets look at the original problem. John only knows that one of the man’s children is a boy:
1) B, G | 0.33
2) G, B | 0.33
3) G, G | 0.00
4) B, B | 0.33
P(B)|(4) = 1 P(G)| (1,2) = 1
P(B)= .33 P(G) = .66
So lets say that now the woman tells John that the boy is also the eldest:
1) B, G | 0.5
2) G, B | 0.0
3) G, G | 0.0
4) B, B | 0.5
P(B)|(4) = 1 P(G)| (1) = 1 P(B)= .5 P(G) = .5
At first I saw a problem because John obviously knows more given the second piece of information, so the fact that his estimate is worse seemed really weird. What I think is going on here is that his learning more really does decrease his ability to predict the gender of the other child: Before, he had 3 options, 2 of which contained a girl-answer. Now, one of those 2 answers are taken away, so he currently has 2 options, 1 of which contains a girl-answer. As he becomes more informed about the total state of the world, his ability to predict this particular piece of information decreases.
Thank you, that is very helpful! If I understand it, according to your analysis, Sarah knows less about the total state of the birth order/ gender of the two children. Still, it seems like she knows more about the particular gender of the child at home.
I guess the problem is with the “knows more” words. It’s not just how many bits of information you get, but also how are they related to your question. As a trivial example, it would be better to have 1 relevant bit of information than 1024 bits of irrelevant information. In this example, all information is relevant, but differently.
Imagine the following situation: You have letters “A”, “B”, “C”, “D” and you randomly choose one of them.
You have two participants in the experiment. To the first participant you tell that you did not choose “A”. To the second participant you tell that you did not choose “B”. Each of them has the same amount of information, right?
Then you ask them whether the letter you chose was a consonant. The first one says “Certainly yes.” The second one says “I am not sure, but with probability 66% yes.”
How is it possible that the same amount of information gives them different certainty? The answer is, the same amount of information in general is not necessarily the same amount of information about the question you gave them.
EDIT: As wgd points out below, my answer here is wrong in its particulars (I didn’t take into account all of the information available to Sarah and George in the puzzle as stated). The general principles invoked are sound, though.
George does actually know more. I think you’re getting thrown by the fact that his 50-50 probability distribution seems more equivocal (less concentrated) than Sarah’s 66-33 distribution. But remember that these distributions are defined over a space of four elements (boy-boy, boy-girl, girl-boy and girl-girl), so the actual distributions are 0.5-0.5-0-0 for George and 0.33-0.33-0.33-0 for Sarah. When you see it this way, it becomes a bit more plausible that Sarah’s distribution is actually more “spread out”, more equivocal.
To be more precise, suppose you have been given the task of conveying information about the genders of this man’s children. You decide that you will transmit a 0 to represent a boy and a 1 to represent a girl. If the receiver has absolutely no information about the man’s children, apart from the fact that there are two of them and neither is genderqueer, you will need to send two bits of information—one for the elder child’s gender, and one for the younger one’s—in order to convey full information about the genders. On the other hand, if the receiver already knows the children’s genders, you have to send zero bits of information in order to convey full information. So you can think of the number of bits you need to transmit as a measure of the lack of knowledge of the receiver. The fewer bits you need to send, the more the receiver knows.
Now let’s compare George and Sarah. George already knows the elder child’s gender, so you only need to send one further bit, representing the younger child’s gender, in order to convey full information. Sarah’s case is trickier. She knows that one of the children is a boy, but she doesn’t know which one. If it turns out that both children are boys, then your task is easy: you need to send just one bit of information, a 0, representing the gender of the child she hasn’t seen. Once she gets this bit, Sarah will know the genders of both children. But if the other child is a girl, you can’t just say that. You will also need to tell Sarah whether the order of birth is boy-girl or girl-boy. So besides sending a 1 to represent a girl, you’ll need to send one more bit of information in order to distinguish between boy-girl and girl-boy. This means that the number of bits you will have to send Sarah is either 1 or 2, depending on whether the other child is a boy or a girl. If you did this experiment over and over again, with a bunch of different groups of siblings, the average number of bits you send Sarah will be greater than 1 but less than 2.
So with George you only need 1 bit to convey full information, while with Sarah you need (on average) more than 1 bit. This means Sarah does indeed know less about the situation, and there is no paradox. All of this can be made a lot more rigorous using the concept of Shannon entropy, if you’re interested.
I agree that George definitely does know more information overall, since he can concentrate his probability mass more sharply over the 4 hypotheses being considered, but I’m fairly certain you’re wrong when you say that Sarah’s distribution is 0.33-0.33-0-0.33. I worked out the math (which I hope I did right or I’ll be quite embarassed), and I get 0.25-0.25-0-0.5.
I think your analysis in terms of required message lengths is arguably wrong, because the purpose of the question is to establish the genders of the children and not the order in which they were born. That is, the answer to the question “What gender is the child at home?” can always be communicated in a single bit, and we don’t care whether they were born first or second for the purposes of the puzzle. You have to send >1 bit to Sarah only if she actually cares about the order of their births (And specifically, your “1 or 2 bits, depending” result is made by assuming that we don’t care about the birth order if they’re boys. If we care whether the boy currently out walking is the eldest child regardless of the other child’s gender we have to always send Sarah 2 bits).
Another way to look at that result is that when you simply want to ask “What is the probability of a boy or a girl at home?” you are adding up two disjoint ways-the-world-could-be for each case, and this adding operation obscures the difference between Sarah’s and George’s states of knowledge, leading to them both having the same distribution over that answer.
Good point. I was treating the description of Sarah’s encounter with the man as a proxy for “Sarah knows one of the man’s children is a boy, but not which one.” That seems to be the way it’s usually intended when the problem is presented, but you’re right that in the problem as described, Sarah has an additional relevant piece of information—that the man is out with a boy. I think this is an unintended artifact of the way the problem is presented, though. The people presenting the problem are usually trying to get at something different. The usual intent of the puzzle is captured by “Sarah knows that one of Brian’s two children is a boy, and George knows that his eldest child is a boy. What are the probabilities according to Sarah and George that Brian’s other child is a boy?”.
Again, I think this is an unintended artifact of the way the puzzle is stated. The fact that Sarah sees one of the kids and doesn’t see the other one gives her a way of individuating the kids other than their birth order. If we don’t assume she has this method of individuation (as in the restated puzzle above) then the birth order is relevant.
I think we’re in agreement then, although I’ve managed to confuse myself by trying to actually do the Shannon entropy math.
In the event we don’t care about birth orders we have two relevant hypotheses which need to be distinguished between (boy-girl at 66% and boy-boy at 33%), so the message length would only need to be 0.9 bits#Definition) if I’m applying the math correctly for the entropy of a discrete random variable. So in one somewhat odd sense Sarah would actually know more about the gender than George does.
Which, given that the original post said
may not actually be implausible. Huh.
Pragmatist is correct, I did not realize that the way I stated the problem was different than the original.
I full understand the solution to this problem.
However, lets look at the original problem. John only knows that one of the man’s children is a boy:
1) B, G | 0.33
2) G, B | 0.33
3) G, G | 0.00
4) B, B | 0.33
P(B)|(4) = 1 P(G)| (1,2) = 1
P(B)= .33 P(G) = .66
So lets say that now the woman tells John that the boy is also the eldest:
1) B, G | 0.5
2) G, B | 0.0
3) G, G | 0.0
4) B, B | 0.5
P(B)|(4) = 1 P(G)| (1) = 1
P(B)= .5 P(G) = .5
At first I saw a problem because John obviously knows more given the second piece of information, so the fact that his estimate is worse seemed really weird. What I think is going on here is that his learning more really does decrease his ability to predict the gender of the other child: Before, he had 3 options, 2 of which contained a girl-answer. Now, one of those 2 answers are taken away, so he currently has 2 options, 1 of which contains a girl-answer. As he becomes more informed about the total state of the world, his ability to predict this particular piece of information decreases.
The fact that John predicts 0.5 while Sarah predicts 0.66 doesn’t mean that Sarah’s prediction is somehow better.
Thank you, that is very helpful! If I understand it, according to your analysis, Sarah knows less about the total state of the birth order/ gender of the two children. Still, it seems like she knows more about the particular gender of the child at home.
Is that still a problem?
I guess the problem is with the “knows more” words. It’s not just how many bits of information you get, but also how are they related to your question. As a trivial example, it would be better to have 1 relevant bit of information than 1024 bits of irrelevant information. In this example, all information is relevant, but differently.
Imagine the following situation: You have letters “A”, “B”, “C”, “D” and you randomly choose one of them.
You have two participants in the experiment. To the first participant you tell that you did not choose “A”. To the second participant you tell that you did not choose “B”. Each of them has the same amount of information, right?
Then you ask them whether the letter you chose was a consonant. The first one says “Certainly yes.” The second one says “I am not sure, but with probability 66% yes.”
How is it possible that the same amount of information gives them different certainty? The answer is, the same amount of information in general is not necessarily the same amount of information about the question you gave them.