“Most people answer “librarian.” Which is a mistake: shy salespeople are much more common than shy librarians, because salespeople in general are much more common than librarians—seventy-five times as common, in the United States.”
The question is whether or not the person is more likely to be a librarian or a salesperson given that we know that they’re shy. In other words, it’s a posterior probability. It’s a question about P(librarian|shy) vs. P(salesperson|shy). The statement that salespeople are, in general, 75 times more common than librarians is a question of prior probability, i.e. P(librarian) vs. P(salesperson).
We can easily make it be the case that the shy person is still more likely to be a librarian despite the prior probabilities given above by just saying “Assume 100% of librarians are shy and 1% of salespeople are shy.” Now, given that the person is shy, the odds are 1:0.75 that they are a librarian.
Indeed. I made the same point elsewhere, and furthermore concluded that the claim that the subjects were succumbing to base rate neglect is not well-supported by the source material. (In fact, even the conclusion that “librarian” is the wrong answer is not supported by the cited sources!)
There is no way that the posterior odds are more than (1:75) toward the librarian. I would be very surprised if it were (1:10). Spelled out, The point of the example seems to be “people forget the base rate, and once you know the base rate, it’s obvious that it’s more significant than the update based on shyness”. I don’t need a source for this; it doesn’t matter whether the update based on shyness is (1:1.25) or (1:3) or something in between; any of that is dominated by the base rate.
loti made the point I’m about to make above, but appears to have taken it back; I’m not sure why, as it seem totally right.
Anyway: it’s certainly true that it doesn’t strictly follow from the fact that there are 75 times as many salespeople as librarians (and you know this) that you ought to be more confident that someone is a salesperson than a librarian, if all you know about them is that they are shy. However, that conclusion does follow on totally plausible assumptions about the frequency of shy people among librarians and the frequency of shy people among salespeople (and you having credences close to these frequencies). It follows, for instance, if four out of five librarians are shy, and only one in twenty salespeople are shy.
For it to not be the case that you should think the person is more likely be shy, given the base rate, you would have to think the frequency of shy people among librarians is 75 times higher than the frequency of shy people among salespeople. For that to be possible, the rate of shy people among salespeople would have to be less than or equal to one in 75. That is very low, I’d find that somewhat surprising. Even more surprising would be to find out that the frequency of shy librarians is close to 100%.
Maybe that’s the case; I don’t know. Probably Rob Bensinger doesn’t know for sure either; so yeah, he probably shouldn’t have been so categorical when he said that this is “a mistake”. But I think we can forgive Rob Bensinger here for using this as an example of base-rate neglect, because it’s pretty plausible that it is.
In fact, even if, as it happens, the numbers work out, and people get the right answer here, I expect most people don’t worry about the base rate at all when they answer this question, so they’re getting the right answer purely by luck; if that’s right, then this would still be an example of base-rate neglect.
The most valued finding (environment’s milestone) is the shy salesman. The average valued finding is the shy librarian or corrolary bookworm. We already know shy persons in our surrondings. We are searching objets that map the territory. The bias is about reading the map, not seeing its heterogeneity or multiple authors.
The cognitive bias presented here is to ignore the difference between P(librarian) vs P(salespeople), and draw conclusion solely based on P(shy|librarian) vs P(shy|salespeople). Since, salespeople are more likely to be shy (i.e P(shy|salespeople) > P(shy|librarian)), the bias leads to the wrong conclusion P(librarian|shy) > P(salespeople|shy).
Hi guys,
I’m really not happy about this claim:
“Most people answer “librarian.” Which is a mistake: shy salespeople are much more common than shy librarians, because salespeople in general are much more common than librarians—seventy-five times as common, in the United States.”
The question is whether or not the person is more likely to be a librarian or a salesperson given that we know that they’re shy. In other words, it’s a posterior probability. It’s a question about P(librarian|shy) vs. P(salesperson|shy). The statement that salespeople are, in general, 75 times more common than librarians is a question of prior probability, i.e. P(librarian) vs. P(salesperson).
We can easily make it be the case that the shy person is still more likely to be a librarian despite the prior probabilities given above by just saying “Assume 100% of librarians are shy and 1% of salespeople are shy.” Now, given that the person is shy, the odds are 1:0.75 that they are a librarian.
Indeed. I made the same point elsewhere, and furthermore concluded that the claim that the subjects were succumbing to base rate neglect is not well-supported by the source material. (In fact, even the conclusion that “librarian” is the wrong answer is not supported by the cited sources!)
There is no way that the posterior odds are more than (1:75) toward the librarian. I would be very surprised if it were (1:10). Spelled out, The point of the example seems to be “people forget the base rate, and once you know the base rate, it’s obvious that it’s more significant than the update based on shyness”. I don’t need a source for this; it doesn’t matter whether the update based on shyness is (1:1.25) or (1:3) or something in between; any of that is dominated by the base rate.
loti made the point I’m about to make above, but appears to have taken it back; I’m not sure why, as it seem totally right.
Anyway: it’s certainly true that it doesn’t strictly follow from the fact that there are 75 times as many salespeople as librarians (and you know this) that you ought to be more confident that someone is a salesperson than a librarian, if all you know about them is that they are shy. However, that conclusion does follow on totally plausible assumptions about the frequency of shy people among librarians and the frequency of shy people among salespeople (and you having credences close to these frequencies). It follows, for instance, if four out of five librarians are shy, and only one in twenty salespeople are shy.
For it to not be the case that you should think the person is more likely be shy, given the base rate, you would have to think the frequency of shy people among librarians is 75 times higher than the frequency of shy people among salespeople. For that to be possible, the rate of shy people among salespeople would have to be less than or equal to one in 75. That is very low, I’d find that somewhat surprising. Even more surprising would be to find out that the frequency of shy librarians is close to 100%.
Maybe that’s the case; I don’t know. Probably Rob Bensinger doesn’t know for sure either; so yeah, he probably shouldn’t have been so categorical when he said that this is “a mistake”. But I think we can forgive Rob Bensinger here for using this as an example of base-rate neglect, because it’s pretty plausible that it is.
In fact, even if, as it happens, the numbers work out, and people get the right answer here, I expect most people don’t worry about the base rate at all when they answer this question, so they’re getting the right answer purely by luck; if that’s right, then this would still be an example of base-rate neglect.
Ah! Yes! I’ve never been able to properly formulate an answer to why this example bothers me so much, but you did it! Thank you!
“25%” of mankind are shy.
“75%” of librarians are shy.
“1%” of salesmen are shy.
The most valued finding (environment’s milestone) is the shy salesman. The average valued finding is the shy librarian or corrolary bookworm. We already know shy persons in our surrondings. We are searching objets that map the territory. The bias is about reading the map, not seeing its heterogeneity or multiple authors.
Hi colossal_noob,
The point this example is trying to make, perhaps, can be better understood with the expansions of bayes rules.
P(librarian|shy) = (P(shy|librarian) * P(librarian)) / P(shy)
P(salespeople|shy) = (P(shy|salespeople) * P(salespeople)) / P(shy)
The cognitive bias presented here is to ignore the difference between P(librarian) vs P(salespeople), and draw conclusion solely based on P(shy|librarian) vs P(shy|salespeople). Since, salespeople are more likely to be shy (i.e P(shy|salespeople) > P(shy|librarian)), the bias leads to the wrong conclusion P(librarian|shy) > P(salespeople|shy).