Cross posted from my personal blog.

In this post, I’m going to assume you’ve come across the Cognitive Reflection Test before and know the answers. If you haven’t, it’s only three quick questions, go and do it now.

One of the striking early examples in Kahneman’s Thinking, Fast and Slow is the following problem:

(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball.

How much does the ball cost? _____ cents

This question first turns up informally in a paper by Kahneman and Frederick, who find that most people get it wrong:

Almost everyone we ask reports an initial tendency to answer “10 cents” because the sum $1.10 separates naturally into $1 and 10 cents, and 10 cents is about the right magnitude. Many people yield to this immediate impulse. The surprisingly high rate of errors in this easy problem illustrates how lightly System 2 monitors the output of System 1: people are not accustomed to thinking hard, and are often content to trust a plausible judgment that quickly comes to mind.

In Thinking Fast and Slow, the bat and ball problem is used as an introduction to the major theme of the book: the distinction between fluent, spontaneous, fast ‘System 1’ mental processes, and effortful, reflective and slow ‘System 2’ ones. The explicit moral is that we are too willing to lean on System 1, and this gets us into trouble:

The bat-and-ball problem is our first encounter with an observation that will be a recurrent theme of this book: many people are overconfident, prone to place too much faith in their intuitions. They apparently find cognitive effort at least mildly unpleasant and avoid it as much as possible.

This story is very compelling in the case of the bat and ball problem. I got this problem wrong myself when I first saw it, and still find the intuitive-but-wrong answer very plausible looking. I have to consciously remind myself to apply some extra effort and get the correct answer.

However, this becomes more complicated when you start considering other tests of this fast-vs-slow distinction. Frederick later combined the bat and ball problem with two other questions to create the Cognitive Reflection Test:

(2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? _____ minutes

(3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? _____ days

These are designed to also have an ‘intuitive-but-wrong’ answer (100 minutes, 24 days), and an ‘effortful-but-right’ answer (5 minutes, 47 days). But this time I seem to be immune to the wrong answers, in a way that just doesn’t happen with the bat and ball:

I always have the same reaction, and I don’t know if it’s common or I’m just the lone idiot with this problem. The ‘obvious wrong answers’ for 2. and 3. are completely unappealing to me (I had to look up 3. to check what the obvious answer was supposed to be). Obviously the machine-widget ratio hasn’t changed, and obviously exponential growth works like exponential growth.

When I see 1., however, I always think ‘oh it’s that bastard bat and ball question again, I know the correct answer but cannot see it’. And I have to stare at it for a minute or so to work it out, slowed down dramatically by the fact that Obvious Wrong Answer is jumping up and down trying to distract me.

If this test was really testing my propensity for effortful thought over spontaneous intuition, I ought to score zero. I hate effortful thought! As it is, I score two out of three, because I’ve trained my intuitions nicely for ratios and exponential growth. The ‘intuitive’, ‘System 1’ answer that pops into my head is, in fact, the correct answer, and the supposedly ‘intuitive-but-wrong’ answers feel bad on a visceral level. (Why the hell would the lily pads take the same amount of time to cover the second half of the lake as the first half, when the rate of growth is increasing?)

The bat and ball still gets me, though. My gut hasn’t internalised anything useful, and it’s super keen on shouting out the wrong answer in a distracting way. My dislike for effortful thought is definitely a problem here.

I wanted to see if others had raised the same objection, so I started doing some research into the CRT. In the process I discovered a lot of follow-up work that makes the story much more complex and interesting.

I’ve come nowhere near to doing a proper literature review. Frederick’s original paper has been cited nearly 3000 times, and dredging through that for the good bits is a lot more work than I’m willing to put in. This is just a summary of the interesting stuff I found on my limited, partial dig through the literature.

Thinking, inherently fast and inherently slow

Frederick’s original Cognitive Reflection Test paper describes the System 1/System 2 divide in the following way:

Recognizing that the face of the person entering the classroom belongs to your math teacher involves System 1 processes — it occurs instantly and effortlessly and is unaffected by intellect, alertness, motivation or the difficulty of the math problem being attempted at the time. Conversely, finding $\sqrt{19163}$ to two decimal places without a calculator involves System 2 processes — mental operations requiring effort, motivation, concentration, and the execution of learned rules.

I find it interesting that he frames mental processes as being inherently effortless or effortful, independent of the person doing the thinking. This is not quite true even for the examples he gives — faceblind people and calculating prodigies exist.

This framing is important for interpreting the CRT. If the problem inherently has a wrong ‘System 1 solution’ and a correct ‘System 2 solution’, the CRT can work as intended, as an efficient tool to split people by their propensity to use one strategy or the other. If there are ‘System 1’ ways to get the correct answer, the whole thing gets much more muddled, and it’s hard to disentangle natural propensity to reflection from prior exposure to the right mathematical concepts.

My tentative guess is that the bat and ball problem is close to being this kind of efficient tool. Although in some ways it’s the simplest of the three problems, solving it in a ‘fast’, ‘intuitive’ way relies on seeing the problem in a way that most people’s education won’t have provided. (I think this is true, anyway—I’ll go into more detail later.) I suspect that this is less true the other two problems—ratios and exponential growth are topics that a mathematical or scientific education is more likely to build intuition for.

(Aside: I’d like to know how these other two problems were chosen. The paper just states the following:

Motivated by this result [the answers to the bat and ball question], two other problems found to yield impulsive erroneous responses were included with the “bat and ball” problem to form a simple, three-item “Cognitive Reflection Test” (CRT), shown in Figure 1.

I have a vague suspicion that Frederick trawled through something like ‘The Bumper Book of Annoying Riddles’ to find some brainteasers that don’t require too much in the way of mathematical prerequisites. The lilypads one has a family resemblance to the classic grains-of-wheat-on-a-chessboard puzzle, for instance.)

However, I haven’t found any great evidence either way for this guess. The original paper doesn’t break down participants’ scores by question – it just gives mean scores on the test as a whole. I did however find this meta-analysis of 118 CRT studies, which shows that the bat and ball question is the most difficult on average – only 32% of all participants get it right, compared with 40% for the widgets and 48% for the lilypads. It also has the biggest jump in success rate when comparing university students with non-students. That looks like better mathematical education does help on the bat and ball, but it doesn’t clear up how it helps. It could improve participants’ ability to intuitively see the answer. Or it could improve ability to come up with an ‘unintuitive’ solution, like solving the corresponding simultaneous equations by a rote method.

What I’d really like is some insight into what individual people actually do when they try to solve the problems, rather than just this aggregate statistical information. I haven’t found exactly what I wanted, but I did turn up a few interesting studies on the way.

No, seriously, the answer isn’t ten cents

My favourite thing I found was this (apparently unpublished) ‘extremely rough draft’ by Meyer, Spunt and Frederick from 2013, revisiting the bat and ball problem. The intuitive-but-wrong answer turns out to be extremely sticky, and the paper is basically a series of increasingly desperate attempts to get people to actually think about the question.

One conjecture for what people are doing when they get this question wrong is the attribute substitution hypothesis. This was suggested early on by Kahneman and Frederick, and is a fancy way of saying that they are instead solving the following simpler problem:

(1) A bat and a ball cost $1.10 in total. The bat costs $1.00.

How much does the ball cost? _____ cents

Notice that this is missing the ‘more than the ball’ clause at the end, turning the question into a much simpler arithmetic problem. This simple problem does have ‘ten cents’ as the answer, so it’s very plausible that people are getting confused by it.

Meyer, Spunt and Frederick tested this hypothesis by getting respondents to recall the problem from memory. This showed a clear difference: 94% of ‘five cent’ respondents could recall the correct question, but only 61% of ‘ten cent’ respondents. It’s possible that there is a different common cause of both the ‘ten cent’ response and misremembering the question, but it at least gives some support for the substitution hypothesis.

However, getting people to actually answer the question correctly was a much more difficult problem. First they tried bolding the words more than the ball to make this clause more salient. This made surprisingly little impact: 29% of respondents solved it, compared with 24% for the original problem. Printing both versions was slightly more successful, bumping up the correct response to 35%, but it was still a small effect.

After this, they ditched subtlety and resorted to pasting these huge warnings above the question:

Computation warning: 'Be careful! Many people miss the following problem because they do not take the time to check their answer. Comprehension warning: 'Be careful! Many people miss the following problem because they read it too quickly and actually answer a different question than the one that was asked.'

These were still only mildly effective, with a correct solution jumping to 50% from 45%. People just really like the answer ‘ten cents’, it seems.

At this point they completely gave up and just flat out added “HINT: 10 cents is not the answer.” This worked reasonably well, though there was still a hard core of 13% who persisted in writing down ‘ten cents’.

That’s where they left it. At this point there’s not really any room to escalate beyond confiscating the respondents’ pens and prefilling in the answer ‘five cents’, and I worry that somebody would still try and scratch in ‘ten cents’ in their own blood. The wrong answer is just incredibly compelling.

So, what are people doing when they solve this problem?

Unfortunately, it’s hard to tell from the published literature (or at least what I found of it). What I’d really like is lots of transcripts of individuals talking through their problem solving process. The closest I found was this paper by Szaszi et al, who did carry out these sort of interview, but it doesn’t include any examples of individual responses. Instead, it gives a aggregated overview of types of responses, which doesn’t go into the kind of detail I’d like.

Still, the examples given for their response categories give a few clues. The categories are:

Correct answer, correct start. Example given: ‘I see. This is an equation. Thus if the ball equals to x, the bat equals to x plus 1… ’
Correct answer, incorrect start. Example: ‘I would say 10 cents… But this cannot be true as it does not sum up to €1.10...’
Incorrect answer, reflective, i.e. some effort was made to reconsider the answer given, even if it was ultimately incorrect. Example: ‘… but I’m not sure… If together they cost €1.10, and the bat costs €1 more than the ball… the solution should be 10 cents. I’m done.’
No reflection. Example: ‘Ok. I’m done.’

These demonstrate one way to reason your way to the correct answer (solve the simultaneous equations) and one way to be wrong (just blurt out the answer). They also demonstrate one way to recover from an incorrect solution (think about the answer you blurted out and see if it actually works). Still, it’s all rather abstract and high level.

How To Solve It

However, I did manage to stumble onto another source of insight. While researching the problem I came across this article from the online magazine of the Association for Psychological Science, which discusses a variant ‘Ford and Ferrari problem’. This is quite interesting in itself, but I was most excited by the comments section. Finally some examples of how the problem is solved in the wild!

The simplest ‘analytical’, ‘System 2’ solution is to rewrite the problem as two simultaneous linear equations and plug-and-chug your way to the correct answer. For example, writing $B$ for the bat and $b$ for the ball, we get the two equations

$B + b = 110$ , $B - b = 100$ ,

which we could then solve in various standard ways, e.g.

$2 B = 210$ , $B = 105$ ,

which then gives

$b = 110 - B = 5$ .

There are a couple of variants of this explained in the comments. It’s a very reliable way to tackle the problem: if you already know how to do this sort of rote method, there are no surprises. This sort of method would work for any similar problem involving linear equations.

However, it’s pretty obvious that a lot of people won’t have access to this method. Plenty of people noped out of mathematics long before they got to simultaneous equations, so they won’t be able to solve it this way. What might be less obvious, at least if you mostly live in a high-maths-ability bubble, is that these people may also be missing the sort of tacit mathematical background that would even allow them to frame the problem in a useful form in the first place.

That sounds a bit abstract, so let’s look at some responses (I’ll paste all these straight in, so any typos are in the original). First, we have these two confused commenters:

The thing is, why does the ball have to be $.05? It could have been .04 0r.03 and the bat would still cost more than $1.

and

This is exactly what bothers me and resulted in me wanting to look up the question online. On the quiz the other 2 questions were definitive. This one technically could have more than one answer so this is where phycologists actually mess up when trying to give us a trick question. The ball at .4 and the bat at 1.06 doesn’t break the rule either.

These commenters don’t automatically see two equations in two variables that together are enough to constrain the problem. Instead they seem to focus mainly on the first condition (adding up to $1.10) and just use the second one as a vague check at best (‘the bat would still cost more than $1’). This means that they are unable to immediately tell that the problem has a unique solution.

In response, another commenter, Tony, suggests a correct solution which is an interesting mix of writing the problem out formally and then figuring out the answer by trial and error:\

I hear your pain. I feel as though psychologists and psychiatrists get together every now and then to prove how stoopid I am. However, after more than a little head scratching I’ve gained an understanding of this puzzle. It can be expressed as two facts and a question A=100+B and A+B=110, so B=? If B=2 then the solution would be 100+2+2 and A+B would be 104. If B=6 then the solution would be 100+6+6 and A+B would be 112. But as be KNOW A+B=110 the only number for B on it’s own is 5.

This suggests enough half-remembered mathematical knowledge to find a sensible abstract framing, but not enough to solve it the standard way.

Finally, commenter Marlo Eugene provides an ingenious way of solving the problem without writing all the algebraic steps out:

Linguistics makes all the difference. The conceptual emphasis seems to lie within the word MORE.

X + Y = $1.10. If X = $1 MORE then that leaves $0.10 TO WORK WITH rather than automatically assign to Y

So you divide the remainder equally (assuming negative values are disqualified) and get 0.05.

So even this small sample of comments suggests a wide diversity of problem-solving methods leading to the two common answers. Further, these solutions don’t all split neatly into ‘System 1’ ‘intuitive’ and ‘System 2’ ‘analytic’. Marlo Eugene’s solution, for instance, is a mixed solution of writing the equations down in a formal way, but then finding a clever way of just seeing the answer rather than solving them by rote.

I’d still appreciate more detailed transcripts, including the time taken to solve the problem. My suspicion is still that very few people solve this problem with a fast intuitive response, in the way that I rapidly see the correct answer to the lilypad question. Even the more ‘intuitive’ responses, like Marlo Eugene’s, seem to rely on some initial careful reflection and a good initial framing of the problem.

If I’m correct about this lack of fast responses, my tentative guess for the reason is that it has something to do with the way most of us learn simultaneous equations in school. We generally learn arithmetic as young children in a fairly concrete way, with the formal numerical problems supplemented with lots of specific examples of adding up apples and bananas and so forth.

But then, for some reason, this goes completely out of the window once the unknown quantity isn’t sitting on its own on one side of the equals sign. This is instead hived off into its own separate subject, called ‘algebra’, and the rules are taught much later in a much more formalised style, without much attempt to build up intuition first.

(One exception is the sort of puzzle sheets that are often given to young kids, where the unknowns are just empty boxes to be filled in. Sometimes you get 2+3=□, sometimes it’s 2+□=5, but either way you go about the same process of using your wits to figure out the answer. Then, for some reason I’ll never understand, the worksheets get put away and the poor kids don’t see the subject again until years later, when the box is now called $x$ for some reason and you have to find the answer by defined rules. Anyway, this is a separate rant.)

This lack of a rich background in puzzling out the answer to specific concrete problems means most of us lean hard on formal rules in this domain, even if we’re relatively mathematically sophisticated. Only a few build up the necessary repertoire of tricks to solve the problem quickly by insight. I’m reminded of a story in Feynman’s The Pleasure of Finding Things Out:

Around that time my cousin, who was three years older, was in high school. He was having considerable difficulty with his algebra, so a tutor would come. I was allowed to sit in a corner while the tutor would try to teach my cousin algebra. I’d hear him talking about x.

I said to my cousin, “What are you trying to do?”

“I’m trying to find out what x is, like in 2x + 7 = 15.”

I say, “You mean 4.”

“Yeah, but you did it by arithmetic. You have to do it by algebra.”

I learned algebra, fortunately, not by going to school, but by finding my aunt’s old schoolbook in the attic, and understanding that the whole idea was to find out what x is—it doesn’t make any difference how you do it.

I think this reliance on formal methods might be somewhat less true for exponential growth and ratios, the subjects underpinning the lilypad and widget questions. Certainly I seem to have better intuition there, without having to resort to rote calculation. But I’m not sure how general this is.

How To Visualise It

If you wanted to solve the bat and ball problem without having to ‘do it by algebra’, how would you go about it?

My original post on the problem was a pretty quick, throwaway job, but over time it picked up some truly excellent comments by anders and Kyzentun, which really start to dig into the structure of the problem and suggest ways to ‘just see’ the answer. The thread with anders in particular goes into lots of other examples of how we think through solving various problems, and is well worth reading in full. I’ll only summarise the bat-and-ball-related parts of the comments here.

We all used some variant of the method suggested by Marlo Eugene in the comments above. Writing out the basic problem again, we have:

$B + b = 110$ , $B - b = 100$ .

Now, instead of immediately jumping to the standard method of eliminating one of the variables, we can just look at what these two equations are saying and solve it directly ‘by thinking’. We have a bat, $B$ . If you add the price of the ball, $b$ , you get 110 cents. If you instead remove the same quantity $b$ you get 100 cents. So the bat’s price must be exactly halfway between these two numbers, at 105 cents. That leaves five for the ball.

Now that I’m thinking of the problem in this way, I directly see the equations as being ‘about a bat that’s halfway between 100 and 110 cents’, and the answer is incredibly obvious.

Kyzentun suggests a variant on the problem that is much less counterintuitive than the original:

A centered piece of text and its margins are 110 columns wide. The text is 100 columns wide. How wide is one margin?

Same numbers, same mathematical formula to reach the solution. But less misleading because you know there are two margins, and thus know to divide by two after subtracting.

In the original problem, the 110 units and 100 units both refer to something abstract, the sum and difference of the bat and ball. In Kyzentun’s version these become much more concrete objects, the width of the text and the total width of the margins. The work of seeing the equations as relating to something concrete has mostly been done for you.

Similarly, anders works the problem by ‘getting rid of the 100 cents’, and splitting the remainder in half to get at the price of the ball:

I just had an easy time with #1 which I haven’t before. What I did was take away the difference so that all the items are the same (subtract 100), evenly divide the remainder among the items (divide 10 by 2) and then add the residuals back on to get 105 and 5.

The heuristic I seem to be using is to treat objects as made up of a value plus a residual. So when they gave me the residual my next thought was “now all the objects are the same, so whatever I do to one I do to all of them”.

I think that after reasoning my way through all these perspectives, I’m finally at the point where I have a quick, ‘intuitive’ understanding of the problem. But it’s surprising how much work it was for such a simple bit of algebra.

Final thoughts

Rather than making any big conclusions, the main thing I wanted to demonstrate in this post is how complicated the story gets when you look at one problem in detail. I’ve written about close reading recently, and this has been something like a close reading of the bat and ball problem.

Frederick’s original paper on the Cognitive Reflection Test is in that generic social science style where you define a new metric and then see how it correlates with a bunch of other macroscale factors (either big social categories like gender or education level, or the results of other statistical tests that try to measure factors like time preference or risk preference). There’s a strange indifference to the details of the test itself – at no point does he discuss why he picked those specific three questions, and there’s no attempt to model what was making the intuitive-but-wrong answer appealing.

The later paper by Meyer, Spunt and Frederick is much more interesting to me, because it really starts to pick apart the specifics of the bat and ball problem. Is an easier question getting substituted? Can participants reproduce the correct question from memory?

I learned the most from the individual responses, though. This is where you really get to see the variety of ways that people tackle the problem. Careful reflection definitely seems to improve the chance of a correct answer in general, but many of the responses don’t really fit the neat ‘fast vs slow’ division of the original setup.

Questions

I’m interested in any comments on the post, but here are a few specific things I’d like to get your answers to:

My rapid, intuitive answer for the bat and ball question is wrong (at least until I retrained it by thinking about the problem way too much). However, for the other two I ‘just see’ the correct answer. Is this common for other people, or do you have a different split?
If you’re able to rapidly ‘just see’ the answer to the bat and ball question, how do you do it?
How do people go about designing tests like these? This isn’t at all my field and I’d be interested in any good sources. I’d kind of assumed that there’d be some kind of serious-business Test Creation Methodology, but for the CRT at least it looks like people just noticed they got surprising answers for the bat and ball question and looked around for similar questions. Is that unusual compared to other psychological tests?

The Bat and Ball Problem Revisited