Shut Up And Guess
Related to: Extreme Rationality: It’s Not That Great
A while back, I said provocatively that the rarefied sorts of rationality we study at Less Wrong hadn’t helped me in my everyday life and probably hadn’t helped you either. I got a lot of controversy but not a whole lot of good clear examples of getting some use out of rationality.
Today I can share one such example.
Consider a set of final examinations based around tests with the following characteristics:
* Each test has one hundred fifty true-or-false questions.
* The test is taken on a scan-tron which allows answers of “true”, “false”, and “don’t know”.
* Students get one point for each correct answer, zero points for each “don’t know”, and minus one half point for each incorrect answer.
* A score of >50% is “pass”, >60% is “honors”, >70% is “high honors”.
* The questions are correspondingly difficult, so that even a very intelligent student is not expected to get much above 70. All students are expected to encounter at least a few dozen questions which they can answer only with very low confidence, or which they can’t answer at all.
At what confidence level do you guess? At what confidence level do you answer “don’t know”?
I took several of these tests last month, and the first thing I did was some quick mental calculations. If I have zero knowledge of a question, my expected gain from answering is 50% probability of earning one point and 50% probability of losing one half point. Therefore, my expected gain from answering a question is .5(1)-.5(.5)= +.25 points. Compare this to an expected gain of zero from not answering the question at all. Therefore, I ought to guess on every question, even if I have zero knowledge. If I have some inkling, well, that’s even better.
You look disappointed. This isn’t a very exciting application of arcane Less Wrong knowledge. Anyone with basic math skills should be able to calculate that out, right?
I attend a pretty good university, and I’m in a postgraduate class where most of us have at least a bachelor’s degree in a hard science, and a few have master’s degrees. And yet, talking to my classmates in the cafeteria after the first test was finished, I started to realize I was the only person in the class who hadn’t answered “don’t know” to any questions.
I have several friends in the class who had helped me with difficult problems earlier in the year, so I figured the least I could do for them was to point out that they could get several free points on the exam by guessing instead of putting “don’t know”. I got a chance to talk to a few people between tests, and I explained the argument to them using exactly the calculation I gave above. My memory’s not perfect, but I think I tried it with about five friends.
Not one of them was convinced. I see that while I’ve been off studying and such, you’ve been talking about macros of absolute denial and such, and while I’m not sure I like the term, this almost felt like coming up against a macro of absolute denial.
I had people tell me there must be some flaw in my math. I had people tell me that math doesn’t always map to the real world. I had people tell me that no, I didn’t understand, they really didn’t have any idea of the answer to that one question. I had people tell me they were so baffled by the test that they expected to consistently get significantly more than fifty percent of the (true or false!) questions they guessed on wrong. I had people tell me that although yes, in on the average they would do better, there was always the possibility that by chance alone they would get all thirty of the questions they guessed on wrong and end up at a huge disadvantage1.
I didn’t change a single person’s mind. The next test, my friends answered just as many “don’t know”s as the last one.
This floored me, because it’s not one of those problems about politics or religion where people have little incentive to act rationally. These tests were the main component of the yearly grade in a very high-pressure course. My friend who put down thirty “don’t know”s could easily have increased his grade in the class 5% by listening to me, maybe even moved up a whole letter grade. Nope. Didn’t happen. So here’s my theory.
The basic mistake seems to be loss aversion2, the tendency to regret losses more than one values gains. This could be compounded by students’ tendency to discuss answers after the test: I remember each time I heard that one of my guesses had been wrong and I’d lost points, it was a deep psychic blow. No doubt my classmates tended to remember the guesses they’d gotten wrong more than the ones they’d gotten right, leading to the otherwise inexplicable statement that they expect to get more than half of their guesses wrong. But this mistake should disappear once the correct math is explained. Why doesn’t it?
In The Terrible...Truth About Morality, Roko gives a good example of the way our emotional and rational minds interact. A person starts with an emotion—in that case, a feeling of disgust about incest, and only later come up with some reason why that emotion is the objectively correct emotion to have and why their action of condemning the relationship is rationally justified.
My final exam, thanks to loss aversion, created an emotional inclination against guessing, which most of the students taking it followed. When confronted with an argument against it, my friends tried to come up with reasons why the course they took was logical—reasons which I found very unconvincing.
It’s really this last part which was so perfect I couldn’t resist posting about it. One of my close friends (let’s call him Larry) finally admitted, after much pestering on my part, that guessing would increase his score. But, he said, he still wasn’t going to guess, because he had a moral objection to doing so. Tests were supposed to measure how much we knew, not how lucky we were, and if he really didn’t know the answer, he wanted that ignorance to be reflected in his final score.
A few years ago, I would have respected that strong committment to principle. Today, jaded as I am, I waited until the last day of exams, when our test was a slightly different format. Instead of being true-false, it was multiple-choice: choose one of eight. And there was no penalty for guessing; indeed, there wasn’t even a “don’t know” on the answer sheet, although you could still leave it blank if you really wanted.
”So,” I asked Larry afterwards, “did you guess on any of the questions?”
″Yeah, there were quite a few I didn’t know,” he answered.
When I reminded him about his moral commitment, he said something about how this was different because there were more answers available so it wasn’t really the same as guessing on a fifty-fifty question. At the risk of impugning my friend’s subconscious motives, I think he no longer had to use moral ideals to rationalize away his fear of losing points, so he did the smart thing and guessed.
Footnotes
1: If I understand the math right, then if you guess on thirty questions using my test’s scoring rule, the probability of ending up with a net penalty from guessing is less than one percent [EDIT: Actually just over two percent, thank you ArthurB]. If, after finishing all the questions of which they were “certain”, a person felt confident that they were right over the cusp of a passing grade, assigned very high importance to passing, and assigned almost no importance to any increase in grade past the passing point, then it might be rational not to guess, to avoid the less than one percent chance of failure. In reality, no one could calculate their grade out this precisely.
2: Looking to see if anyone else had been thinking along the same lines3, I found a very interesting paper describing some work of Kahneman and Tversky on this issue, and proposing a scoring rule that takes loss aversion into account. Although I didn’t go through all of the math, the most interesting number in there seems to be that on a true/false test that penalizes wrong answers at the same rate it rewards correct answers (unlike my test, which rewarded guessing), a person with the empirically determined level of human loss aversion will (if I understand the stats right) need to be ~79% sure before choosing to answer (as opposed to the utility maximizing level of >50%). This also linked me to prospect theory, which is interesting.
3: I’m surprised that test-preparation companies haven’t picked up on this. Training people to understand calibration and loss aversion could be very helpful on standardized tests like the SATs. I’ve never taken a Kaplan or Princeton Review course, but those who have tell me this topic isn’t covered. I’d be surprised if the people involved didn’t know the science, so maybe they just don’t know of a reliable way to teach such things?
- [Crosspost] On Hreha On Behavioral Economics by 31 Aug 2021 18:14 UTC; 106 points) (
- Index of Yvain’s (Excellent) Articles by 30 Jun 2011 9:57 UTC; 36 points) (
- 18 May 2010 12:09 UTC; 10 points) 's comment on Multiple Choice by (
- 16 Sep 2009 3:09 UTC; 1 point) 's comment on Open Thread: September 2009 by (
- 27 Oct 2009 1:18 UTC; 0 points) 's comment on Why the beliefs/values dichotomy? by (
This reminds me of an infamous chemistry exam that no longer existed at my college by the time I got there but had passed into the student lore. For each question, you would first mark your answer (multiple choice, 4 or 5 choices), and then mark a confidence option. These were “high confidence” (5 points if right, −3 if wrong), “low confidence” (3 points if right, 0 if wrong), and “I don’t know, give me a point” (1 point regardless of what answer is marked).
This exam was not popular with the students.
For those who don’t feel like running the numbers, “I don’t know” is the best option when you think the probability of your answer being correct is between 20% and 33%, “low confidence” is the best when you think the probability is between 33% and 60%, and “high confidence” is the best when you think the probability is between 60% and 100%.
If your probability doesn’t fall between 20% and 100%, you’re doing something wrong.
For true/false questions, you can’t think the probability of your answer being correct is less than 50%. If you did think that, it wouldn’t be your answer.
I vaguely recall reading an anecdote about a similar testing scheme where you had to give an actual numerical confidence value for each answer. Saying you were 100% confident of an answer that was wrong would give you minus infinity points.
I bet that would be even less popular with students.
I’ve given those kinds of tests in my decision analysis and my probabilistic analysis courses (for the multiple choice questions). Four choices, logarithmic scoring rule, 100% on the correct answer gives 1 point, 25% on the correct answer gives zero points, and 0% on the correct answer gives negative infinity.
Some students loved it. Some hated it. Many hated it until they realized that e.g. they didn’t need 90% of the points to get an A (I was generous on the points-to-grades part of grading).
I did have to be careful; minus infinity meant that on one question you could fail the class. I did have to be sure that it wasn’t a mistake, that they actually meant to put a zero on the correct answer.
If you want to try, you might want to try the Brier scoring rule instead of the logarithmic; it has a similar flavor without the minus infinity hassle.
...wow. Well, I guess that’s one way to teach people to avoid infinite certainty. Reminiscent of Jeffreyssai. Did that happen to a lot of students?
Some students started putting zeros on the first assignment or two. However, all they needed was to see a few people get nailed putting 0.001 on the right answer (usually on the famous boy-girl probability problem) and people tended to start spreading their probability assignments. Some people never learn, though, so once in a while people would fail. I can only remember three in eight years.
My professor ran a professional course like this. One year, one of the attendees put 100% on every question on every assignment, and got every single answer correct. The next year, someone attended from the same company, and decided he was going to do the same thing. Quite early, he got minus infinity. My professor’s response? “They both should be fired.”
I cannot begin to say how vehemently I disagree with the idea of firing the first attendee. If I found out that your professor had fired them I would fire your professor.
Sure, it has to be an expected utility fail if you take the problem literally, because of how little it would have cost to put only 99.99% on each correct answer, and how impossible it would be to be infinitely certain of getting every answer right. But this fails to take into account the out-of-context expected utility of being AWESOME.
Firing the second guy is fine.
[comment deleted]
[comment deleted]
Given that this was stated as used in “decision analysis” and “probabilistic analysis” courses I would hope not...
It’s rare that one has a chance to make the structure of an exam itself teach the material, independent of the content, heh.
Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.
For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)
Unless I’m misunderstanding something, this is true for the Brier score, too: http://en.wikipedia.org/wiki/Scoring_rule#Proper_score_functions
You’re correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.
If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if
f0′ (x) = (k—x.f1′ (x))/(1-x)
if we set f1(x) = 1 - (1-x)^p we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)
for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 3⁄2 = x^3 − 3x^2/2
1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB’s score
I’m not following your calculations exactly, so please correct me if I’m misunderstanding, but it seems that you are assuming that the student chooses an option and a confidence for that option? My understanding was that the student chooses a probability distribution over all options and is scored on that. As for how to extend the Brier score to more than two options, I’m not sure whether there’s a standard way to do that, but one could always limit oneself to true/false questions… (in the log case you simply score log q_i, where q_i is the probability the student put on the correct answer, of course)
No.
I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.
If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in “1” for the most likely answer and 0 otherwise.
Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.
The number of options drop out of the equation that’s beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that… histograms, gaussians… there are many ways a students could write in his answer.
Ok, so you’re saying the total score the student gets is
f1(q_i*) + Sum_(i /= i*) f0(q_i)
? I didn’t understand that from your original post, sorry.So does “(if) he score for a wrong answer was 0 (...) the only proper score function is the log” mean that if there are more than two options, log is the only proper score function that depends only on the probability assigned to the correct outcome, not on the way the rest of the probability mass is distributed among the other options? Or am I still misunderstanding?
Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave
f0′ (x) = (k—x.f1′ (x))/(1-x)
for f0 = 0, it means x.f1′(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)
Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^
What does 0.01% on the wrong answer get you?
Depends what you do with the other 99.99% and the other three answers, I assume.
In a two-answer scenario, if I’m understanding bill’s version of the log scoring rule correctly, giving p=0.9999 to the right answer and p=0.0001 to the wrong answer should get you [log(0.9999)-log(1/2)]/log(2) ~= 0.99986 points. With four answers, giving p=0.9997 to the right answer and p=0.0001 to each of three wrong answers should get you [log(0.9997)-log(1/4)]/log(4) ~= 0.99978 points.
Meet “The Aumann Game”. See also: scoring rules.
Damn, I wish the exams I passed had one of those. Measuring your self-knowledge (and thus training it) is way more valuable than checking whether you memorized the right facts !
Nice. What happens if you think you’re right on the cusp of a grade boundary, as in OP’s footnote 1? I think there are cases to be considered for when you’re right under a grade boundary and right above a grade boundary, and the value you place on a grade change versus potential increase/decrease in intra-grade marks. All together, fairly mathematically taxing to be rational...
Awesome.
I think the problem is that people tend to conflate intention with effect, often with dire effect, (eg. “Banning drugs == reducing harm from drug use”). Thus when they see a mechanism in place that seems intended to penalise guessing, they assume that its the same as actually penalising guessing, and that anything that shows otherwise must be a mistake.
This may explian the “moral” objection of the one student: The test attempts to penalise guessing, so working against this intention is “cheating” by exploiting a flaw in the test. With the no-penalty multiple choice, theres no such intent so the assumption is that the benefits of guessing are already factored in.
This may not in fact be as silly as it sounds. Suppose that the test is unrelated to mathematics, and that there is no external motive to doing well. Eg. you are taking a test on Elizabethan history with no effect on your final grade, and want to calibrate yourself against the rest of the class. Here, this kind of test is a flaw, because the test isn’t measuring solely what it intends to, but will be biased towards those who spot this advantage. If you are interested solely in an accurate result, and you think the rest of the class won’t realise the advantage of guessing, taking the extra marks will just introducing noise, so it is not to your advantage to take them.
For a mathematics or logic based test, the extra benefit could be considered an extra, hidden question. For something else, it could be considered as immoral as taking advantage of any other unintentional effect (a printing error that adds a detectable artifact on the right answer for instance). Taking advantage of it means you are getting extra marks for something the test is not supposed to be counting. I don’t think I’d consider it immoral (certainly not enough to forgo the extra marks in something important), but Larry’s position may not be as inconsistent as you think.
They were in SAT prep books 25-27 years ago. (I took the SAT’s while I was still 15.) The explanation given was something along the lines of, “Most people say that the SAT penalizes you for guessing, but this is wrong. Rather, it simply makes sure that, on average, guessing won’t get you any extra points if you don’t know anything about the question. If you can eliminate even one wrong answer out of five, you will always come out ahead by guessing. If you can’t, then you still won’t lose anything by guessing.” They then showed math and examples to back it up.
It was actually in a very early part of the book I read, because they wanted you to understand how important it was to be able to identify even one wrong answer, and thus why the methods you were going to learn for doing that were important.
I also remember reading and using this information when taking the SAT, so it surprises me that Yvain’s classmates wouldn’t take the free points. My best guess, unfounded except for intuition, is that the something-for-nothing aspect triggered a “this can’t be!” feeling. Or something. Yeah I dunno, as far as I remember everyone I talked to about this in high school was fine with guessing after eliminating a choice.
A third data point in agreement: in my HS it was repeatedly drilled into us (by the official prep materials, by teachers, by everyone) that you should always guess.
Although otiose to adduce further examples, I will still mention that ~5 years ago, my SAT prep book made it very clear that you could always eliminate some of the choices and thus that you always wanted to guess.
Well, the book I read also emphasized that even if you had no clue, you still couldn’t lose anything by guessing; on average it would just come out the same as if you left it blank, so you might as well give it a try.
But I can certainly understand why this is easier to get in the context of a 4- or 5-answer question than 2. To understand the true/false case, you need to understand at least a little about calibrated probabilities.
I remember seeing advice regarding the SAT and AP exams that recommended guessing when eliminating either one or two responses. Almost nobody seemed to know the theory behind it, which I guess is why both options sounded perfectly plausible. I was horrified when my AP Lit teacher recommended to the class guessing only if 2 or more responses could be eliminated, and couldn’t understand the math I did to show otherwise and wondered how such a thing was possible then there was a “penalty” for guessing (yes, she actually based her understanding of the situation on the name given rather than on doing any logical analysis; Orwell was surely rolling in his grave). Thankfully, she decided to trust me anyways.
This matches my experience as well. They did explain the math, but they didn’t dwell on it. Mostly, they just drilled it into the heads of the students that you MUST guess or horrible things will happen.
I left the course at that point because it seemed to me like cheating.
This exactly matches my experience with test prep books from a decade later.
Honestly, it’s the first and foundational principle of learning to take multiple-choice tests—figure out what the guessing penalty is. And, in my experience, few such tests are calibrated to make guessing net negative.
Quick pruning of the set of possible answers and guessing if you can’t decide on the rest is just what you do.
It would be interesting to re-frame the test as “start with 50 points. Get 1⁄2 for a correct answer, lose 1⁄2 for don’t know, and lose 1 for a wrong answer”. I suspect your friends would accept this as equivalent scoring, and would start guessing more.
The loss aversion is probably less strong because they’re already taking a loss by not guessing, so losing just a bit more isn’t that painful.
I just realized: I think (I cannot swear to it) that my SAT study guide book did that exact thing—renormalize the scoring algorithm so that leaving a blank was a loss of points.
Nice application of the Allais Hack.
Oh my dear lord Cthulhu. Can I ask what level of class this was? If you say it was a postgraduate course at MIT, I may gather the last sane members of the human race and move to Pluto.
Postgraduate course at a university that’s not Ivy League caliber but reasonably well-respected. In contrast to ahem some of the comments below, these people are all quite smart, some consistently better able to understand difficult concepts than I and a few having good original published research. This sort of rationality stuff is just a different skill that some smart people just don’t have aptitude in.
What discipline was the class in? Did the subject matter itself prime people away from thinking about probabilities? o.O
How about wording this differently? Not the “last sane members of the human race.” But the “first sane members of the human race.”
Heh. When I read: “Anyone with basic math skills should be able to calculate that out, right?” I thought: “yes!”—and waited for the inevitable complication—but it never came.
Perhaps we should come up with some sort of maxim to remind ourselves that not every weird result has a complex possibly-evolutionary explanation - ‘sometimes, people really are just stupid’.
People are very frequently stupid, but there is always a causal explanation of their stupidity.
It’s just that sometimes there is a very simple explanation that helps predict the direction of stupidity, and that we might share that stupidity.
I’m curious: if instead of moving to Pluto, these people simply bred with each other, what would result?
There might be genes for intelligence, but I’m extremely skeptical that there are genes for LW-style rationality. Teaching each other, on the other hand, might work.
From personal experience, there seems to be a large variance in rationality even after conditioning on intelligence for those that have never had any ‘formal’ rationality training.
I’m not sure where exactly this comes from, but it would not surprise me if there was a large genetic component.
Given that people can be genetically predisposed to such emergent things as hating homosexuality (how would you make a neural net do that?), it doesn’t seem far-fetched that this sort of thing is inheritable. Of course, I don’t think homosexuality-hating evolved over mere centuries, nor do I know of statistically significant evidence that LW-style rationality has been inherited.
It seems pretty easy to me. Wire a man up with the following instinctual responses:
It’s not too far from there to an outright hatred of homosexuality, if you don’t think too hard about it and you don’t have the rational defenses to make this a non-issue. This is one of the benefits of rationalism, by the way: defense against miscellaneous harmful bullshit.
How easy is it to make a neural net recognize “woman” and “man”?
I don’t know, but since there obviously is a way that our brains distinguish between men and women and assign sexual attraction based on that distinction, I don’t know that the mechanism is relevant to this discussion unless you’re really into writing image classification algorithms.
What I’m saying here is that I tend to treat a lot of complex brain functions, like image recognition or motor control, as primitives that we get for free from nature. This seems to be the only way to make a cache-lookup-based brain work in practice.
A year back, I encountered a this kind of a test: binary multiple choice, one point for right answer, minus half a point for a wrong answer, zero points for no answer. (Multiple-choice exams of any kind are very rare in Finnish universities, so that’s pretty much the only time in my life when I’ve been faced with a test like that.) Looking at the scoring, I came to the same conclusion as you: my expected score would be higher if I’d just try guessing each of the questions I wasn’t sure on.
I didn’t follow my own advice. I now wish I had, as I failed that exam. I was under a pretty heavy workload at the time, so I never ended up retaking it. I suspect I’d have passed if I’d just shut up and multiplied.
Why didn’t I follow my own advice? I did have some kind of a conscious reason, but in retrospect it seems so flimsy that I have difficulty even formulating it here. It went something along the lines of “I might as well take all the questions I have absolutely no clue on and mark them all as ‘true’, which gives me a 50-50 chance to be right on each one assuming there are as many true as there are false questions. But what if the lecturer, forseeing that somebody would reason this way, wrote the questions in such a way that one alternative is more frequently correct than the other, and there isn’t a 50-50 chance for all questions to be ‘true’? Then my expected return calculation would be off, possibly costing me points!”
Yes, I’m aware of all the flaws in that line of thought, no need to point them out. I really didn’t think it through properly. That implies that the very thing you suggest happened to your friends, happened to me—I instinctively disliked the idea, and then rationalized myself a (bad) reason not to do it.
TANSTAAFL: There Ain’t No Such Thing As A Free Lunch
That’s still a better justifcation of your behaviour than the MIT students [edit: Yvain did not actually say MIT!] used—not to mention that you’re able to look back in retrospect and acknowledge the error of your decision.
This sort of suspicion is a good heuristic, if not the best heuristic. Scam artists (by which I mean casinos and carnies) are skilled at making things appear as if you’ve found the loophole in their game, and when you don’t have enough time to examine the loophole thoroughly you’re generally better off assuming it to be false. From the sound of things you were too busy to do this, not to mention that—being unfamiliar with multiple choice tests in general—it caught you with your pants down. You would have had to devote twice the analysis time as a typical North American, who is familiar with these sorts of exams.
Don’t discount the TANSTAAFL heuristic—you made a rational choice based on limited data and available processing time. Your error is wholly different from the errors at MIT.
Ooh, this is interesting. Eliezer says he hopes this wasn’t at MIT or somewhere, and now people are remembering the MIT reference and assuming I go to MIT. Reminds me of that bias where you try to debunk a rumor, and all people can remember is that they heard someone talking about the rumor somewhere and believe it more. What’s that called? There was an OB article on it somewhere, I think.
I should hire Eliezer to come by and make offhanded MIT references during my job interviews.
Dear lord, I just pooped myself. I’m thoroughly familiar with this bias—and I just fell into it.
Isn’t this sort of language manipulation exactly what the PUAs do? Hmm… a potential way of strengthening one’s arguments occurs to me. While in conversation with somebody IRL it should be more effective to phrase things as “Well, Eliezer said...” than “According to Eliezer’s article on...” so as to give the impression of possible first hand knowledge, or at least thorough familiarity with the relevant material.
This is a dark art no doubt, but with most people I find that this is the only way of dealing with them.
(I am not above name-dropping Eliezer to pick up chicks.)
I sincerely doubt that namedropping Eliezer is an effective chick-picking-up technique except under very unusual circumstances! OTOH I have pulled at least once by talking about cryptography, so you never know :-)
That’s an interesting point. You may be right, now that I think of it that way.
Easy way out: Flip a coin for each answer.
Obviously the author of the test cannot possibly know how your coin falls, so you get a true probabilistic chance for each answer.
This is basically the same idea as for randomized quicksort: To guard against malicious data, make your algorithm unpredictable.
I suspect that your friends were simply trying to rationalize their previous behavior or avoid admitting they were wrong. I’ll bet more of them would have been sympathetic to your arguments if they’d been presented before they’d ever taken a test of that type. In fact, I’ll bet a few of them would find your arguments so obvious as to barely be worth mentioning if presented in this context. (E. g. if you’d posed as a brainteaser to your friends: “on a test of this type, do you increase your expected score by guessing or marking don’t know”, I’ll bet some of them would have said “Guess. That’s obvious.”)
According to my interpretation, the only reason your outcome was superior was because you made the discovery early on under your own steam. To measure whether you would be better at admitting you were wrong than your friends, we would have to give you a test where you actually had to admit you were wrong.
Anyway, guessing does increase the variance in your answer. So maybe a more complete argument where you asked your friends how many questions they expected to know and then gave them odds for getting each of “no pass”, “pass”, “honors” and “high honors” using guess and no-guess strategies would have been more effective.
Taking tests is one place where I’ve noticed focus on rationality can give you a boost. I had two classes with a very similar format—each semester had two exams and a final. Each exam had several multi-part questions that got progressively more difficult. The average score was expected to be about 50%, and not everyone was expected to finish any of the exams. Grade in the class was based on ranking—the person with the highest cumulative score got an A, e.g.
A classmate and I realized that we could use the bias other students had for wanting to focus on the easy points to our advantage. That is, the later questions were harder, but they gave much more bang for the buck. It was kind of painful to leave questions you knew you could answer easily blank (which is where overcoming the bias comes in), but it was most certainly worth it when we got the top ranks.
This… absolutely sickens me. It’s bad enough when I hear my family members argue morals/politics/economics that they subscribe to for proximate lifestyle purposes—but when University students pull this, and then ignore the eminent Yvain when he councils them otherwise?
My only comforts are the harsh cold truth of schadenfreude, that such beings don’t deserve an extra 5%, and that at least I only wasted three years and $20 000 at post-secondary.*
*(My degree was non-technical; Humanities students who don’t want a PhD should drop out in second year, spend a year reading, and then lie on their resume.)
P.S. Excellent break down of the reasoning process, Yvain. I think you hit the nail on the head.
The students infer a social rule penalising guessing. In almost all cases exploiting a technicality that works around a social rule is penalised. Nobody likes munchkins.
I would expect to observe a tendency for people rationalise reasons to not guess even if loss aversion was contolled for.
The idea of doing significantly worse than chance by guessing on the test sounds absurd, but I recall that when I was on Chemistry Team in high school, many of the teams we competed against managed it, with teams of four getting average scores of below 20% on four-option multiple choice tests.
If the students were really guessing at random, they ought to expect to do better, but answering questions to the best of their limited knowledge may cause them to do significantly worse than chance if the questions are designed to trip up people with common misunderstandings or gaps in their knowledge of the subject.
In that case, if a team knows it’s that bad it should guess without looking at the answers (and if that doesn’t work, try to generate pseudorandom numbers somehow). If the goal is to learn what you know about chemistry, you can make a note of what you think the correct answer is to the side and then still guess randomly. So the problem here is poor calibration, I guess.
That would have improved their performance, but I think if they had been clever enough to think of that sort of strategy, they probably wouldn’t have needed it.
I think you got your math wrong
If you get 20 out of 30 questions wrong, you are break even, therefore the probability of losing points by guessing is
Sum( (i 30), i = 21..30) / 2^30 ~ 2.14% > 1%
You’re probably right, because I haven’t done a problem like this since forever, but help me figure out what I did wrong. I found a binomial distribution calculator (this is binomial distribution, right?), entered 30 trials, 21 “successes”, (counting a false answer as a success, and agreeing with you that 20 is break even so you need 21 to do worse than even) and .5 probability of success, and it said the cumulative probability was .9919… against, therefore <1%.
On this page, the cumulative refers to the probability of obtaining at most p successes. You want to run it with 30 and 9 which gives you the right answer, 2.14%
Or you could put in 30 and 20 which gives you the complement.
What is lower than 1% is the probability of getting 8 or less right answers.
Oh, I see how they did that. Thanks. Original post edited.
When I took my high school’s AP Calculus classes these last two years, the teacher pointed out that since, on average, guessing would give the same result as leaving questions blank, you might as well guess. As far as I know, nobody disagreed with him.
(Actually, he said it’s better to guess, because leaving a question blank means running the risk of accidentally putting the next question’s answer in the wrong place—which, in one case, led to a student answering practically every question in one section wrongly. But that’s relatively impertinent.)
The same expected test score doesn’t imply the same expected utility.
Hmm. And why is the parent comment downvoted? It’s incorrect to obviously agree with a sweeping assertion that the “result” is the same if expected score is the same. The expected utility of the outcome is going to be strictly higher or lower from following one of these strategies, it’s not going to be the same.
Presumably, as long as you’re not equivocating on ‘expected’, that isn’t true. For tests, ‘test score’==‘utility’, no?
For an exam where what matters is your grade relative to other test-takers, like the SAT, probably yes, but on an exam with a hard pass/fail threshold, the utility function is discontinuous (and therefore non-linear) around the threshold, so guessing might make a difference.
Ah, good point. I always forget ‘get the highest score possible’ isn’t everyone’s goal; presumably, some people would prefer 70% and 100% about equally in this case.
It is about the ‘expected’ part. The average of utilities for each score does not have to equal the utility of the average score. It is only equal when utility scales linearly with score.
Guessing, if you have no idea which way to guess is more likely, will not have quite the same result as leaving questions blank. Leaving questions blank will add 0 to your score, while guessing will add a mostly-Gaussian random variable with a mean of 0. The math of this is kind of fun:
http://en.wikipedia.org/wiki/Random_walk
And of course, the central limit theorem is colossally important:
http://en.wikipedia.org/wiki/Central_limit_theorem
No need to invoke that here—directly calculating the probabilities using the binomial distribution is perfectly practical in this instance.
Practically every question, or about half?
It sounds like your fellow students understood the concept of a guessing penalty, but did not realise that the guessing penalty was too low in this case. One approach to convince them might have been:
Assume you get −0.0001 points for guessing an incorrect answer. Obviously, you should answer every question, because the penalty for guessing is so low. Now, assume that the guessing penalty is −20 points. Again, you obviously shouldn’t guess. What would the penalty have to be where you’re indifferent between guessing and not guessing? Obviously, when the penalty is −1 point. You guess two answers, one is correct and the other not, and your expected score is 0. In this case the penalty is −0.5, which is closer to −0.0001 than to −20, therefore you should always guess.
NB At my university, multiple choice exams always feature four possible answers, and you lose .33 for guessing incorrectly. Every student understands this concept perfectly. If they had to take your exam, they would’ve guessed every single time. It’s strange to see that there are universities where the guessing penalty is not well calibrated. It seems like an elementary thing to do.
Caveat: This is only true if you have no idea at all which is correct. If you think there’s a 60% chance you know the right answer, you’re still better off guessing.
Indeed. As a consequence, once you can narrow the answer down to two or three choices, you’re always better off guessing.
Possibly part of the loss aversion is the desire not to look foolish. I mean, if the teacher is reviewing your exam results, and he sees you answered 100 questions correctly and said “don’t know” for the rest, then you look like a pretty smart and modest guy compared to the schmuck who answered 125 questions correctly and 25 questions incorrectly.
Probably in our evolutionary history, if you looked foolish it was bad news.
But anyway, I am mainly posting in this thread to state that as an attorney I can attest that loss aversion is a big issue in civil litigation. A lot of people are scared to death of going to court and losing even when the downside is pretty minimal. The most successful attorneys I know (at least on the Plaintiffs’ side) lose in various ways on a regular basis.
Considering our evolutionary history, why would such an action look foolish? If you come up with a just so story, consider making up a similar story for the alternative.
Well do you agree it looks more foolish to be wrong than to say “I don’t know.”?
Depends on the context. If guessing is clearly useful, then not guessing might be seen as foolish.
Well that’s true of pretty much any generalization about human nature. Let me put the question slightly differently:
Do you agree that generally speaking it looks more foolish to be wrong than to say “I don’t know.”?
Sure.
Ok, why do you think so? I think it’s because being wrong undermines your credibility and/or reputation for competence.
Suggested intervention if anyone finds themselves in this situation in the future: distribute copies of Ender’s Game and/or Methods. Subjectively it feels to me like identifying with Ender / MoR!Harry has made me better at noticing and more motivated to take advantage of these kinds of optimization opportunities.
Regarding penalizing guessing, if you’re going to penalize it you might as well go all the way. My high school math club once hosted a competition which included a round with a ridiculous guessing penalty (free response, 1 point for a correct answer, 0 for a blank, and −3 or −5 for an incorrect answer). Exactly one person out of a hundred-ish got a positive score.
This is incredible. As others have said the most likely explanation is that people could see the system was intended to dis-incentivise guessing and that this design intent shaped the way they saw the test.
Now I want an exam on “logic and probability” which uses this system. The surprise being that the grading system indicated is in fact a lie. You fail if you put a single “don’t know”. Otherwise you pass with 100%.
(A teacher of mine at school once set us a reading test. It had a big line at the top saying “read this entire test before starting”. Then there were paragraphs of text interpresed with questions about it, “How many people were in the study?”. Right at the end it said “Now you have finished reading, please provide a blank sheet of paper. Do not answer any of the questions.” I was stung.)
This reminds me of a class I had as an undergraduate. To avoid taking another class with a lab, I took Ethnobotany to finish out my general science requirements. The tests were multiple choice with conjunctive answers. For example:
To which the correct answer is a, c, and d. It was computer scored such that the only correct answer was to bubble a, c, and d: bubbling a and c got you no points, nor did bubbling a, b, c, and d, for example.
Given this test format, I put the Conjunctive Fallacy to work for me. When I came across a question where I had no idea if one of the options was part of the answer, I never included it since including it would have made the resulting answer less likely to be true than the answer would have been by leaving it off. The result: I got several questions right that I would have otherwise missed.
That does not make sense. If the option you had no idea about were indeed part of the answer, then leaving it out would cause your answer to be incorrect. The choice is between answer like “a and b and c and d” or “a and b and c and not d”. The Conjunction Fallacy would involve comparing these to the answer “a and b and c”, which, while more likely than the previous choices because it dominates them, is not an admissible answer to the test as you described it.
What may have made this strategy beneficial, is that if you are more likely to recognize options that actually apply to the question, since you had been studying the subject matter, so options that you had no idea about were likely unrelated to the question.
This is related to Cognitive Reflection Test.
What does it have to do with CRT? They both have to do with interest in gambling, but what do they have to do with each other?
I was surprised to learn through your link that CRT correlates with interest in gambling. MIT students do dramatically better than Ivy league students on CRT, but I think of engineering students as conservative. (“MIT PhDs work for Harvard MBAs”)
So that rules out MIT as Yvain’s school ;)
In both cases, people often come to a solution that is dictated by intuition and opposed to very simple explicit analysis, and some people continue to insist on their intuitive solution even after the correct solution is explained to them.
I tend to think of loss aversion as a preference rather than a mental error, but I agree that it probably explains a lot of the “don’t know” answers. What I cannot figure out is why the test designers want an individual’s level of loss aversion to affect their score.
Another possible (and probably over-charitable) explanation for the lack of guessing is that students are afraid of being drawn to the wrong answer. For example, I’ve heard that on the SAT math portion many of the answer choices purposely contain numbers that were mentioned in the question, drawing “random” guessers because such answers look more plausible.
I would say it basically comes down to the fact that abstract rationality is slow and requires lots of processing power. For the same reasons we can usually only mentally afford to employ a certain limited set of fairly abstracted terms, and can only follow the implications of this to a limited degree. If we were all Kryptonians it would probably be pretty functionally rational to stay in ‘far mode’ all the time, but as the squishy, dumb bugs we are a lot of our functional capacity derives from various habitual and patterned behaviour. Far mode mostly seems to serve as a general regulator for some general patterns, perhaps in order to improve intra-plan cohesion. The whole cognitive consciousness part of this may simply be a side-effect of it being kind of overlayed over the background pattern integration that constitutes our ordinary mental processes.
College admissions and financial aid generally are gold mines of examples of non-optimizing behavior with large consequences. I fairly regularly see people losing tens of thousands of dollars or more through failure to spend a few hours doing relevant research.
FYI, the average student can expect to increase their score by about 8% by guessing, assuming the “non-guesses” were right 80% of the time. An enormous effect.
Maybe (if your goal was to get them to score higher) you could have pitched it as gambling for entertainment, i.e. record which answers you guessed on, and later compare who was luckiest in getting the biggest portion of those correct.
Most apathetic students have no qualms with guessing. It sounds like these peers of yours are either extremely diligent and motivated by fear, or unwilling to lose face by admitting that you’re more clever than they.
Are you sure that none of them were pulling your leg?
It’s implausible that most of them were.
I suggest that the students aren’t as irrational as they appear. After all, why would the designer of the test incorporate a “don’t know” option and a penalty for wrong answers, except to discourage guessing on questions that you’re clueless on? And if I were a random student (instead of someone especially interested in the mathematics of decision theory), why should I take the trouble to second guess the test designer, instead of assuming that (with high probability) he is rational and competent at his job?
ETA: Also, you’re supposed to maximize expected utility, not expected number of points. Increasing the variance of your score may decrease expected utility, even if it keeps the expected score the same. (I see that John Maxwell IV has made a similar point.)
I think students are happy to take advantage of amateur test design, like when one question’s back story reveals the answer to a different question. So this post is a good demonstration of how not being a rationalist can make you a sucker.
EDIT: Please ignore this comment, which was based on a misreading of Yvain’s post. I failed to notice that a wrong answer gives minus one half point, not minus one point.
But taking advantage of this test design is trickier than it looks. The best strategy is not necessarily “always guess”. (Actually it almost certainly isn’t “always guess”.)
For example, suppose that your expected score is above 50%, and you don’t care much about getting honors but really need to pass. Then you shouldn’t guess if you have no idea what the answer is, since guessing increases the probability that your score will fall below 50% by bad luck.
Here’s another example. Suppose you’re certain about 71% of the answers, and are slightly unsure about the rest. Then you should answer “don’t know” for all of the questions that you’re slightly unsure about, since there is no additional utility for getting more points above 70%, and by guessing you’re just decreasing the probability of getting “high honors” for no possibility of gain.
The students who refused to guess may actually have behaved rationally (even if they can’t articulate why). I think this story illustrates the dangers of overriding intuition with partial knowledge.
Edit: In sum, since there will be at least 30ish questions unknown, then losing any points at all by guessing is unlikely enough that you’d need to be quite unusually well calibrated to justify not-guessing to raise e.g. probability of getting over 70% given that you know (99% confidence) you’ve already got exactly 71%.
You’re right. I’ve edited my comment with the cause of my error.
Interesting. I edited my comment to summarize it greatly, since I no longer think I’m trying to convince someone who is wildly wrong on the probability. :) But now my edit’s reverted.
What if UDT instances were taking the test? They should be able to conclude that everyone ever opting to exploit the rule is a net disadvantage (the passing score would, of course, have to be raised to compensate for the gambling points).
When I take a true-false test, I second-guess the author on every question.