So, I presented a five-minute speech to my community college class on the base rate fallacy. Unfortunately, it went over their heads at about the same apparent distance as airplanes usually pass over my own. Here is the text of the speech, if anyone would be so kind as to offer criticism.
“I have come here to chew bubblegum and kick ass...and I’m all out of bubblegum”. You might remember this line from the 80s sci fi movie They Live. In the movie, society is controlled by aliens who are using up Earth’s natural resources. They’re in politics, they’re in law enforcement, and they look just like humans. Then the protagonist finds a box of sunglasses that let him see who’s an alien and who isn’t. For some reason they also make everything black and white. It’s a crappy movie, but the point is, there are similar situations in real life, where we’re trying to figure out who has a certain disease, or who the criminal is, or whatever the case may be. Let’s consider a situation where we’re trying to figure out who has breast cancer.
Let’s pretend you have a friend who’s going to the doctor to get a breast cancer screening. The doctor tells your friend that in her age group, 40-50, about 1% of women have breast cancer. He tells her that mammograms are 80% accurate, with a 10% false positive rate. After the screening, your friend is told that she tested positive. Now, how likely do you think it is that your friend has breast cancer? Congratulations, you just committed the base rate fallacy. Let’s take another look at this question, which was originally published in the New England Journal of Medicine. Think about 1000 women just like your friend, visiting a clinic to get screened for breast cancer. We already know that on average 1% of those women will have cancer. So 10 women will actually have cancer. This means that 990 women will not have cancer. Since the test is 80% accurate, 8 of the women with cancer will test positive. But since there is a 10% false positive rate, 99 women without cancer will also test positive. So out of 107 women who test positive, only 8 actually have breast cancer. That’s about a 7.5% probability that she has cancer. Would it change the way you view your test results, or the advice you might give to a loved one, if you knew how base rates affected the accuracy of the test?
When it comes to reducing uncertainty, one thing we can do is to get a second, and even a third test, independently conducted. Sometimes, however, that’s simply not possible. Let’s consider another situation. Imagine someone installed a camera in your city to catch terrorists. Just for the sake of the argument, let’s say that we know that there are 100 terrorists among the 1 million inhabitants of your city. If the camera sees a terrorist, it will ring a bell 99% of the time. Its false positive rate is only 1%. Sounds great, right? Suppose the city’s entire 1 million inhabitants pass before the camera. It will ring the bell 99 times for the 100 terrorists – and 9,999 times for the rest of the citizens. Now what? Do you stick upwards of 10,000 people in a holding cell and keep running them past the camera? Do you spend the time and money to criminally investigate 10,000 people? Is imprisoning 10,000 people to catch 100 acceptable in terms of human rights and security? What does this problem sound like?
If you said it sounds a lot like the TSA, you’re right. Bruce Schneier, a security technologist, estimates we can expect to find about 1 terrorist among 8 million people passing through airports. This low base rate already makes it clear that the TSA will be relegated to screening false positives nearly all the time. It’s hard to be certain. But, as one online journalist pointed out, the TSA’s success rate – far from being 99% as in our example above – is close enough to 0 to be negligible. They’ve found lots of contraband, including weapons and even explosives. But I wasn’t able to find a single instance that was linked to a terrorist or terrorist group. We do know that terrorists are still flying. They are occasionally caught – by other agencies – prior to flying. Setting aside the questionable competency or practicality of TSA procedures, and based purely on numbers alone, do you think the TSA is worth a reported 8 billion dollars and 90 million hours of waiting in line per year?
Keep in mind that base rates tend to have little to no effect if they are very high. Since about 50% of women who take pregnancy tests are already pregnant, pregnancy tests are accurate enough to be generally reliable. Of course, everyone knows that they are also not 100% accurate. A good way to think about probabilities and percentages is to use natural frequencies. Just as I did in the situations I talked about tonight, think about a group of people in terms of tens or hundreds out of thousands or millions. That will help you understand what all the numbers really mean.
I hope that you leave tonight with a good understanding of what base rates are and how they affect how we think about problems. Both problems that affect us personally, and societal issues that affect all of us.
What are you hoping that people will do with this information? Most of these folks will never run the TSA, so they can’t do much except gripe about being made to take their shoes off in airports. Even in the breast cancer example, the most that your average person would take away from the speech is “you’re supposed to multiply something by...something, and somehow the test might be wrong.” What advice are they supposed to give their friend? Most women with a scary-looking mammogram who hear their friend say, “You’re probably fine” are going to doubt whether the friend takes their health seriously.
The problem with supposedly practical applications of Bayes’ theorem is that you usually don’t have the data to do the math even if you know how, and there’s usually not much practical action you can take based on it anyway. It’s an interesting idea, and the people who like that sort of thing may want to learn more about it, but there’s no information in this lecture that would let them trace the idea (other than talking to you afterwards). But I gather this was not the kind of audience who goes home and googles Bayes’ theorem, so mentioning the name probably wouldn’t have done much.
In most of the cases where knowing about base rates would help us, we don’t actually know the base rate. If I know my child’s preschool teacher is being investigated for child abuse, is that strong evidence that she really abuses children? I suspect most preschool teachers do not abuse children, but that many are accused of it at some point in their careers, but I don’t know the rates of either. So I can’t really draw useful conclusions.
Thanks for your input. I’m not sure whether you are saying that it is a waste of time (both mine and theirs) to try to teach people about Bayesian inference, or whether there was a better way I could have explained it and made it relevant to them. If the latter, do you have any ideas as to how I could improve my treatment of the topic?
I’m not sure there’s a way to make it relevant to a previously uninterested audience in 5 minutes. I think your speech was well done for the constraints you had, but I don’t have ideas for how to make that topic work given the constraints.
I might have picked a simpler cognitive bias to talk about instead.
So, I presented a five-minute speech to my community college class on the base rate fallacy. Unfortunately, it went over their heads at about the same apparent distance as airplanes usually pass over my own. Here is the text of the speech, if anyone would be so kind as to offer criticism.
“I have come here to chew bubblegum and kick ass...and I’m all out of bubblegum”. You might remember this line from the 80s sci fi movie They Live. In the movie, society is controlled by aliens who are using up Earth’s natural resources. They’re in politics, they’re in law enforcement, and they look just like humans. Then the protagonist finds a box of sunglasses that let him see who’s an alien and who isn’t. For some reason they also make everything black and white. It’s a crappy movie, but the point is, there are similar situations in real life, where we’re trying to figure out who has a certain disease, or who the criminal is, or whatever the case may be. Let’s consider a situation where we’re trying to figure out who has breast cancer.
Let’s pretend you have a friend who’s going to the doctor to get a breast cancer screening. The doctor tells your friend that in her age group, 40-50, about 1% of women have breast cancer. He tells her that mammograms are 80% accurate, with a 10% false positive rate. After the screening, your friend is told that she tested positive. Now, how likely do you think it is that your friend has breast cancer? Congratulations, you just committed the base rate fallacy. Let’s take another look at this question, which was originally published in the New England Journal of Medicine. Think about 1000 women just like your friend, visiting a clinic to get screened for breast cancer. We already know that on average 1% of those women will have cancer. So 10 women will actually have cancer. This means that 990 women will not have cancer. Since the test is 80% accurate, 8 of the women with cancer will test positive. But since there is a 10% false positive rate, 99 women without cancer will also test positive. So out of 107 women who test positive, only 8 actually have breast cancer. That’s about a 7.5% probability that she has cancer. Would it change the way you view your test results, or the advice you might give to a loved one, if you knew how base rates affected the accuracy of the test?
When it comes to reducing uncertainty, one thing we can do is to get a second, and even a third test, independently conducted. Sometimes, however, that’s simply not possible. Let’s consider another situation. Imagine someone installed a camera in your city to catch terrorists. Just for the sake of the argument, let’s say that we know that there are 100 terrorists among the 1 million inhabitants of your city. If the camera sees a terrorist, it will ring a bell 99% of the time. Its false positive rate is only 1%. Sounds great, right? Suppose the city’s entire 1 million inhabitants pass before the camera. It will ring the bell 99 times for the 100 terrorists – and 9,999 times for the rest of the citizens. Now what? Do you stick upwards of 10,000 people in a holding cell and keep running them past the camera? Do you spend the time and money to criminally investigate 10,000 people? Is imprisoning 10,000 people to catch 100 acceptable in terms of human rights and security? What does this problem sound like?
If you said it sounds a lot like the TSA, you’re right. Bruce Schneier, a security technologist, estimates we can expect to find about 1 terrorist among 8 million people passing through airports. This low base rate already makes it clear that the TSA will be relegated to screening false positives nearly all the time. It’s hard to be certain. But, as one online journalist pointed out, the TSA’s success rate – far from being 99% as in our example above – is close enough to 0 to be negligible. They’ve found lots of contraband, including weapons and even explosives. But I wasn’t able to find a single instance that was linked to a terrorist or terrorist group. We do know that terrorists are still flying. They are occasionally caught – by other agencies – prior to flying. Setting aside the questionable competency or practicality of TSA procedures, and based purely on numbers alone, do you think the TSA is worth a reported 8 billion dollars and 90 million hours of waiting in line per year?
Keep in mind that base rates tend to have little to no effect if they are very high. Since about 50% of women who take pregnancy tests are already pregnant, pregnancy tests are accurate enough to be generally reliable. Of course, everyone knows that they are also not 100% accurate. A good way to think about probabilities and percentages is to use natural frequencies. Just as I did in the situations I talked about tonight, think about a group of people in terms of tens or hundreds out of thousands or millions. That will help you understand what all the numbers really mean.
I hope that you leave tonight with a good understanding of what base rates are and how they affect how we think about problems. Both problems that affect us personally, and societal issues that affect all of us.
What are you hoping that people will do with this information? Most of these folks will never run the TSA, so they can’t do much except gripe about being made to take their shoes off in airports. Even in the breast cancer example, the most that your average person would take away from the speech is “you’re supposed to multiply something by...something, and somehow the test might be wrong.” What advice are they supposed to give their friend? Most women with a scary-looking mammogram who hear their friend say, “You’re probably fine” are going to doubt whether the friend takes their health seriously.
The problem with supposedly practical applications of Bayes’ theorem is that you usually don’t have the data to do the math even if you know how, and there’s usually not much practical action you can take based on it anyway. It’s an interesting idea, and the people who like that sort of thing may want to learn more about it, but there’s no information in this lecture that would let them trace the idea (other than talking to you afterwards). But I gather this was not the kind of audience who goes home and googles Bayes’ theorem, so mentioning the name probably wouldn’t have done much.
In most of the cases where knowing about base rates would help us, we don’t actually know the base rate. If I know my child’s preschool teacher is being investigated for child abuse, is that strong evidence that she really abuses children? I suspect most preschool teachers do not abuse children, but that many are accused of it at some point in their careers, but I don’t know the rates of either. So I can’t really draw useful conclusions.
Thanks for your input. I’m not sure whether you are saying that it is a waste of time (both mine and theirs) to try to teach people about Bayesian inference, or whether there was a better way I could have explained it and made it relevant to them. If the latter, do you have any ideas as to how I could improve my treatment of the topic?
I’m not sure there’s a way to make it relevant to a previously uninterested audience in 5 minutes. I think your speech was well done for the constraints you had, but I don’t have ideas for how to make that topic work given the constraints.
I might have picked a simpler cognitive bias to talk about instead.