A simpler way to think about positive test bias
Eliezer described “positive bias” (which I’ll rename “positive test bias” for reasons explained in Unnamed’s comment below) in an LW post and an HPMOR chapter. According to him, dealing with that bias requires a kind of mental gymnastics that doesn’t come naturally to most people: “twisty negative thinking” or “flinching toward zero”. You’re supposed to devise tests that come out false if your hypothesis is true. It’s a bit confusing.
I think there’s a simpler way to think about it. Positive test bias is just our bias toward strong hypotheses. You can deal with it by asking yourself, how can I test if my hypothesis is too strong?
The LW post about the bias mentions the Wason 2-4-6 task. If you’re told that the number sequence 2-4-6 has property X, it’s tempting to guess that property X means “ascending arithmetic progression”. To test if that hypothesis is too strong, you need to try some other sequences that are arithmetic progressions but not ascending, and some that are ascending but not arithmetic progressions. Easy!
The HPMOR chapter describes Hermione spilling some soda on her robe, and a moment later it mysteriously becomes clean again. Hermione comes up with a hypothesis that her robe is magically self-cleaning. To test if that hypothesis is too strong, she can try spilling something else on her robe. There’s no need for any counterintuitive thinking.
That technique is useful in other areas as well. For example, I often get carried away when writing posts, and end up with a draft full of wide-reaching conclusions. But then I ask myself, doesn’t that sound a bit too strong? So I look for counterexamples, and the point I’m trying to make either dissolves or becomes much more robust.
Our bias toward strong hypotheses is especially harmful in politics. We will defend a hypothesis like “group X is to blame for everything” for its explanatory power, never noticing the real problem—that it’s too strong. We’d all benefit from recognizing when it happens, and weakening our hypotheses until they match reality.
I don’t think this is quite right, for reasons related to this post.
Sometimes a hypothesis can be “too strong” or “too weak”. Sometimes hypotheses can just be different. You mention the 2-4-6 task and the soda task. In the soda task, Hermoine makes a prediction which is “too strong” in that it predicts anything spilled on the robe will vanish; but also “too weak” in that it predicts the soda will not vanish if spilled on the floor. Actually, I’m not even sure if that is right. What does “too strong” mean? What is a maximally strong or weak hypothesis? Is it based on the entropy of the hypothesis?
I think this mis-places the difficulty in following Eliezer’s “twisty thinking” advice. The problem is that trying to disconfirm a hypothesis is not a specification of a computation you can just carry out. It sort of points in a direction; but, it relies on my ingenuity to picture the scenario where my hypothesis is false. What does this really mean? It means coming up with a second-best hypothesis and then finding a test which differentiates between the best and second best. Similarly, your “too strong” heuristic points in the direction of coming up with alternate hypotheses to test. But, I claim, it’s not really about being “too strong”.
What I would say instead is your test should differentiate between hypotheses (the best hypotheses you can think of; formally, your test should have maximal VIO). The bias is to test your cherished hypothesis against hypotheses which already have a fairly low probability (such as the null hypothesis, perhaps), rather than testing it against the most plausible alternatives.
Just letting you know that after a couple days of thinking about it, I’ve completely come around to your point of view. Figuring out the next best hypothesis that explains all your current data is a much more general approach. It covers so many cases that I even thought of it as the “key to rationality” for a few hours.
I agree, it is the key to rationality. :) I got the idea from Heuer’s CIA debiasing guide, Psychology of Intelligence Analysis. Or rather, from someone at a LW meetup who got it from that guide. An older source is the essay The Method of Multiple Working Hypotheses. Both sources give more detail on the breadth of this idea.
Maybe we should have a post spelling out how much of rationality is covered by this. It’s not widely understood here.
Thanks for the comment! Yeah, “too strong” is mostly a suggestive phrase for figuring out what to test next. But somehow it works better than it has any right to. For example:
Let’s just chase the “too strong” angle in the ordinary English sense, without thinking about it too deeply. You spill something else on your robe and it doesn’t vanish, so you come up with the next hypothesis—that the unique combination of robe and soda is doing the trick. That hypothesis also sounds “too strong” somehow, and the obvious test is to try spilling the soda on the floor. Then the soda vanishes and you have your answer.
So her tests weren’t “powerful” enough to “prove” her hypothesis.
Terminology request: Can we use the term “positive test bias” instead of “positive bias”?
“Positive bias” seems like bad jargon—it is not used by researchers, an unfamiliar listener would probably think that it had something to do with having an overly rosey view of things, and all of the results on the first page of Google except for those from LW use it to refer to an overly rosey view.
Whereas “positive test bias” is used by some researchers in the same sense that Eliezer used “positive bias”, is only used in that sense on the first page of Google hits, is a more precise phrasing of the same idea, and is less likely to be taken by unfamiliar listeners as referring to an overly rosey view.
The term that is most commonly used by researchers is “confirmation bias”, but as Eliezer noted in his original post this term gets used to talk about a cluster of related biases; some researchers recognize this and instead talk about “confirmatory biases”. Singling out “positive test bias” with a separate label seems like a potentially good case of jargon proliferation—having more terms in order to talk more precisely about different related concepts—but calling it “positive bias” seems like a mistake.
Unnamed, great to see you on LW2.0! We interacted a month ago, but I didn’t register that it was you. Renaming the bias seems like a good idea—done.
What is that quote of Scott’s… Something about how the sequences obsolete themselves. And that he remembers the sequences being full of all these great insights about difficult topics—but when he goes back and rereads them, it’s all just so obvious.
You probably see where I’m going with this. It seems entirely possible that when you say “oh, it’s easy, you just notice when you’re making a hypothesis that might be too strong and then come up with a way to test it,” you are in fact doing the complete content of that sequence post that seeemed insightful way back when, it’s just that it’s easy to you now.
That’s part of it, but also Eliezer sometimes makes things sound more complicated than they are. This exchange is a nice example.
Eliezer: And if you think you can explain the concept of “systematically underestimated inferential distances” briefly, in just a few words, I’ve got some sad news for you...
enye-word: “This is going to take a while to explain.” Did I do it? Did I win rationalism?!