Hindsight Devalues Science
This essay is closely based on an excerpt from Meyers’s Exploring Social Psychology; the excerpt is worth reading in its entirety.
Cullen Murphy, editor of The Atlantic, said that the social sciences turn up “no ideas or conclusions that can’t be found in [any] encyclopedia of quotations . . . Day after day social scientists go out into the world. Day after day they discover that people’s behavior is pretty much what you’d expect.”
Of course, the “expectation” is all hindsight. (Hindsight bias: Subjects who know the actual answer to a question assign much higher probabilities they “would have” guessed for that answer, compared to subjects who must guess without knowing the answer.)
The historian Arthur Schlesinger, Jr. dismissed scientific studies of World War II soldiers’ experiences as “ponderous demonstrations” of common sense. For example:
Better educated soldiers suffered more adjustment problems than less educated soldiers. (Intellectuals were less prepared for battle stresses than street-smart people.)
Southern soldiers coped better with the hot South Sea Island climate than Northern soldiers. (Southerners are more accustomed to hot weather.)
White privates were more eager to be promoted to noncommissioned officers than Black privates. (Years of oppression take a toll on achievement motivation.)
Southern Blacks preferred Southern to Northern White officers. (Southern officers were more experienced and skilled in interacting with Blacks.)
As long as the fighting continued, soldiers were more eager to return home than after the war ended. (During the fighting, soldiers knew they were in mortal danger.)
How many of these findings do you think you could have predicted in advance? Three out of five? Four out of five? Are there any cases where you would have predicted the opposite—where your model takes a hit? Take a moment to think before continuing . . .
. . .
In this demonstration (from Paul Lazarsfeld by way of Meyers), all of the findings above are the opposite of what was actually found.1 How many times did you think your model took a hit? How many times did you admit you would have been wrong? That’s how good your model really was. The measure of your strength as a rationalist is your ability to be more confused by fiction than by reality.
Unless, of course, I reversed the results again. What do you think?
Do your thought processes at this point, where you really don’t know the answer, feel different from the thought processes you used to rationalize either side of the “known” answer?
Daphna Baratz exposed college students to pairs of supposed findings, one true (“In prosperous times people spend a larger portion of their income than during a recession”) and one the truth’s opposite.2 In both sides of the pair, students rated the supposed finding as what they “would have predicted.” Perfectly standard hindsight bias.
Which leads people to think they have no need for science, because they “could have predicted” that.
(Just as you would expect, right?)
Hindsight will lead us to systematically undervalue the surprisingness of scientific findings, especially the discoveries we understand—the ones that seem real to us, the ones we can retrofit into our models of the world. If you understand neurology or physics and read news in that topic, then you probably underestimate the surprisingness of findings in those fields too. This unfairly devalues the contribution of the researchers; and worse, will prevent you from noticing when you are seeing evidence that doesn’t fit what you really would have expected.
We need to make a conscious effort to be shocked enough.
1 Paul F. Lazarsfeld, “The American Solidier—An Expository Review,” Public Opinion Quarterly 13, no. 3 (1949): 377–404.
2 Daphna Baratz, How Justified Is the “Obvious” Reaction? (Stanford University, 1983).
- Alignment Implications of LLM Successes: a Debate in One Act by 21 Oct 2023 15:22 UTC; 241 points) (
- Hero Licensing by 21 Nov 2017 21:13 UTC; 239 points) (
- A Crash Course in the Neuroscience of Human Motivation by 19 Aug 2011 21:15 UTC; 203 points) (
- The Useful Idea of Truth by 2 Oct 2012 18:16 UTC; 190 points) (
- EA should blurt by 22 Nov 2022 21:57 UTC; 155 points) (EA Forum;
- A summary of every “Highlights from the Sequences” post by 15 Jul 2022 23:01 UTC; 97 points) (
- Hindsight bias by 16 Aug 2007 21:58 UTC; 73 points) (
- Fake Optimization Criteria by 10 Nov 2007 0:10 UTC; 72 points) (
- Can You Prove Two Particles Are Identical? by 14 Apr 2008 7:06 UTC; 60 points) (
- Heat vs. Motion by 1 Apr 2008 3:55 UTC; 53 points) (
- Why is the Future So Absurd? by 7 Sep 2007 8:42 UTC; 52 points) (
- Words as Mental Paintbrush Handles by 1 Mar 2008 23:58 UTC; 49 points) (
- Range and Forecasting Accuracy by 27 May 2022 18:47 UTC; 48 points) (
- A summary of every “Highlights from the Sequences” post by 15 Jul 2022 23:05 UTC; 47 points) (EA Forum;
- Logical or Connectionist AI? by 17 Nov 2008 8:03 UTC; 46 points) (
- A Suggested Reading Order for Less Wrong [2011] by 8 Jul 2011 1:40 UTC; 38 points) (
- Book Review—Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness by 3 Dec 2018 8:00 UTC; 34 points) (
- Form Your Own Opinions by 28 Apr 2018 19:50 UTC; 34 points) (
- You’re Entitled to Everyone’s Opinion by 20 Sep 2014 15:39 UTC; 33 points) (
- Beautiful Math by 10 Jan 2008 22:43 UTC; 33 points) (
- An unofficial “Highlights from the Sequences” tier list by 5 Sep 2022 14:07 UTC; 29 points) (
- Request: Interesting Invertible Facts by 8 Oct 2010 20:02 UTC; 29 points) (
- On Being Decoherent by 27 Apr 2008 4:59 UTC; 25 points) (
- 16 Feb 2023 22:57 UTC; 22 points) 's comment on Why EA Will Be Anti-Woke or Die by (EA Forum;
- Range and Forecasting Accuracy by 27 May 2022 19:08 UTC; 21 points) (EA Forum;
- Strevens on scientific explanation by 14 Feb 2022 8:10 UTC; 17 points) (
- Rationality Reading Group: Part C: Noticing Confusion by 18 Jun 2015 1:01 UTC; 15 points) (
- 2 Aug 2019 18:02 UTC; 12 points) 's comment on Forum participation as a research strategy by (
- 12 Aug 2010 11:50 UTC; 8 points) 's comment on Five-minute rationality techniques by (
- [SEQ RERUN] Hindsight Devalues Science by 26 Jul 2011 1:27 UTC; 8 points) (
- 9 Jan 2010 19:11 UTC; 8 points) 's comment on Test Your Calibration! by (
- The Perception-Action Cycle by 23 Jul 2012 1:44 UTC; 8 points) (
- 26 Feb 2022 12:52 UTC; 7 points) 's comment on What psychological traits predict interest in effective altruism? by (EA Forum;
- 17 Apr 2012 1:53 UTC; 6 points) 's comment on Interesting rationalist exercise about French election by (
- 25 Aug 2013 16:18 UTC; 5 points) 's comment on Global warming is a better test of irrationality that theism by (
- 27 Jan 2012 1:51 UTC; 5 points) 's comment on I’ve had it with those dark rumours about our culture rigorously suppressing opinions by (
- 1 Dec 2011 17:51 UTC; 5 points) 's comment on Video: Skepticon talks by (
- 14 Aug 2013 14:32 UTC; 5 points) 's comment on Open thread, August 12-18, 2013 by (
- 2 Aug 2019 0:50 UTC; 5 points) 's comment on Forum participation as a research strategy by (
- 3 Apr 2012 19:59 UTC; 4 points) 's comment on Fictional Bias by (
- 30 Aug 2013 22:19 UTC; 3 points) 's comment on Rewriting the sequences? by (
- Meetup : Seattle Sequences group: Mysterious Answers 3 by 27 Feb 2015 7:54 UTC; 2 points) (
- 7 Apr 2012 19:11 UTC; 2 points) 's comment on Rationally Irrational by (
- 2 Aug 2013 2:59 UTC; 2 points) 's comment on Rationality Quotes November 2012 by (
- 28 Jan 2012 17:58 UTC; 1 point) 's comment on I’ve had it with those dark rumours about our culture rigorously suppressing opinions by (
- 30 Jan 2012 14:03 UTC; 1 point) 's comment on I’ve had it with those dark rumours about our culture rigorously suppressing opinions by (
- 31 May 2023 8:37 UTC; 1 point) 's comment on Making History Available by (
- 26 Jun 2013 23:27 UTC; 1 point) 's comment on Open Thread, June 16-30, 2013 by (
- 16 May 2012 14:12 UTC; 1 point) 's comment on Open Thread, May 16-31, 2012 by (
- 30 Jun 2011 19:20 UTC; 0 points) 's comment on I’m becoming intolerant. Help. by (
- 11 Oct 2016 1:51 UTC; 0 points) 's comment on How Many LHC Failures Is Too Many? by (
- The Futility of Intelligence by 15 Mar 2012 14:25 UTC; -10 points) (
Ouch. I had vague feelings that something was amiss, but I believed you when you said they were all correct. I knew that sociology had a lot of nonsense in it, but to proclaim the exact opposite of what actually happened and sound plausible is crazy (and dangerous!).
I certainly agree. Most of those I instantly believed, and I had a bit of doubt for the one about southern blacks preferring southern to northern white officers (or maybe that is belief as attire, or hindsight bias) but as you said it is crazy that the opposite of what is true is believable when told it is correct.
These examples emphasize the benefit of frequently taking calibration tests, where we assign probabilities to answers and then checks those answer for calibration errors. Perhaps someone could create a website where we could do this regularly? Just collect a large list of questions like the ones above, questions with true answers but where we have intuitions about what the answer might be, and then have us answer those questions with probabilities, and then show us a calibration chart for the last X questions. Yes, collecting the good questions will be most of the work.
What if I were to try to create such a web app. Should I take 5 minutes every lunchbreak asking friends and colleagues to brainstorm for questions? Maybe write a LW post asking for questions? Maybe there could be a section of the site dedicated to collecting and curating good questions (crowdsourced or centrally moderated).
CFAR has 2 apps you might find interesting; I was able to find them on apple store easily. http://rationality.org/apps/
Are those apps only available on Apple products/smartphones? No way to access them on a Windows PC?
The Credence Calibration game is also available for Windows. Links to download it in the various formats are here.
The calibration game is also available for Android and was available for Windows but I think the original website is down.
As a lack of known causes fear, hindsight bias delivers us the comfort we desire at all times, the easy model that we build, rather than the unexpected unknown that causes anxiety. In that way we learn—some might prefer this unenlightened state, as hindsight bias smoothes their mental ships away from the scholes of uncertainty and self-doubt.
Eliezer, I don’t have any contribution to make to the conversation. I just want to tell you that the last 10 or so posts from you have absolutely blown me out of my socks. Without a doubt, some of the most impactful and insightful stuff I’ve read in my 10+ years on the web.
Please keep it up.
And yes, I realize there’s an irony to professing what is really a byline bias on this site. :-)
Frankly none of the five examples strikes me as something I could have predicted, nor ever struck me as such. Nevertheless, social science may indeed produce few significant results which are not predictable. How is that possible given the examples above? Simple: the examples may have been cherry picked to make the point. In particular, their significance (to us now) is seriously damaged by the fact that they are not general statements but are statements about a time and place. While they may be generalizable to the present day while preserving their truth, they may not be. We just do not know. So, as they stand, they are not that useful to us now.
Social science almost certainly produces many insignificant results which are not predictable. It is easy enough to come up with questions which we can then methodically answer by gathering data. What percentage of Massachusetts residents like Fig Newtons? Is it higher or lower than the percentage of New York residents who like Fig Newtons? This is certainly a question that can be asked, and one whose answer I do not know. I could obtain a grant and then spend the grant money studying this question. But it is not a significant question, and learning its answer does not advance human knowledge in a significant way.
But what about significant results? Here’s a much more important question: in general, does extreme lack of sleep tend to have any significant negative impact? This is important because if it does not tend to have any significant negative impact, then many people will find this highly useful knowledge. Many people will sleep much less.
But notice something: it is not only an important question, it is also a question which people know the answer to. And this is no coincidence. It is often hard to hide important truths about people, from people.
This is not true of all social facts. Economic facts are facts about large numbers of people interacting, sometimes very indirectly, and so are a kind of fact which people have a hard time seeing, since they only encounter small numbers of other people at any given time, so they do not see the whole.
“But it is not a significant question, and learning its answer does not advance human knowledge in a significant way.”
Unless you work at a large grocery chain or for the company that makes Fig Newtons.
Constant, it’s odd you should choose the sleep example. What with the prospects of modafinil, which supresses the desire for sleep and aids concentration, being used as an enhancement drug, and with people practicing polyphasic sleep, (admittedly with more limited success,) where you sleep 15 min at a time six times a day, the question of what the effects of this lack of regular sleep causes is actually a very open, and very interesting one. Of course, without these modifications, lack of sleep has obvious ill effects.
pdf23ds, I don’t think it’s odd.
Let us grant the two points you have made (of which I am, of course, well aware, as is pretty much anyone who knows what “Digg” or “Reddit” is). In fact let’s go further. Suppose that polyphasic sleep lets you do away with sleep entirely (you just blink once every four hours). Suppose furthermore that a single dose of the new drug lets you do completely without sleep without any ill effects for the rest of your life. Now look at this question and try to answer it:
“in general, does extreme lack of sleep tend to have any significant negative impact?”
The correct answer is still “yes”, because even if you add up the people who have taken the drug and who practice polyphasic sleep, they make up a small minority of the whole population. Since the statement begins with, “in general”, it fails to be contradicted by a small minority.
As for polyphasic sleep and the drug, you can of course prove me wrong but as far as I know, the field of social science gets little if any credit for the discovery and ongoing investigation of polyphasic sleep. From what I have read, the main investigation seems to be done by individual self-experimenters. At least, that’s what’s made it to the social sites. Similarly, drugs are developed by and large by scientists in the fields of biology, chemistry, biochemistry, etc., not social science, and clinical trials are performed by and large by doctors, not social scientists.
Eliezer, I don’t have any contribution to make to the conversation. I just want to tell you that the last 10 or so posts from you have absolutely blown me out of my socks.
I agree—very impressive series of articles. Still just about clinging to my own socks, but they’re definitely trying to get away.
What’s with the nitpicking, Constant? Of course the general question has the same answer, and I’m not sure what you’re trying to prove by asserting your own familiarity with the phenomena I mentioned. And I don’t really have any interest in the relationship between polyphasic sleep and the social sciences—why would you think I did?
The reason I thought it odd is that there are other obvious questions that are not even close to being associated with interesting open issues in science, and yet you chose this one. Probably you weren’t even thinking of those complications—they weren’t relevant to your point.
Thirded—Eliezer, your posts on this blog are some of the most impressive work I’ve ever read. The world needs more like you.
“What’s with the nitpicking, Constant? [...] they weren’t relevant to your point”
I had—mistakenly as it turns out—assumed that you were obeying Grice’s maxims. You were, by your own eventual admission, disobeying the maxim of relevance. That is why I misunderstood you.
Out of curiosity, which time was Yudkowsky actually telling the truth? When he said those five assertions were lies, or when he said the previous sentence was a lie? I don’t want to make any guesses yet. This post broke my model; I need to get a new one before I come back.
It is a process lesson, not a lesson about facts.
But, if you have to know the facts, it is easy enough to click on the provided link to the Meyer article and find out. Which, I suppose, is another process lesson.
You might find it a worthwhile exercise to decide what your current model is, first.
That is, how likely do you consider those five statements?
Once you know that, you can research the actual data and discover how much your model needs updating, and in what directions. That way you can construct a new model that is closer to observed data.
If you don’t know what your current model is, that’s much harder.
I could only predict one out of five.
Hm, when I first read those findings, I found the first three to be as expected, the fourth to be surprising (why would blacks want racist officers?), and for the fifth I found the result to make sense but considered that I would have though the same if the opposite result had been found (soldiers don’t want to abandon their friends in combat, but want to leave together afterwards). So this would seem to indicate a problem with my model, given that the findings were all false.
But, is it possible that in that demonstration, those specific findings were selected specifically because they were opposite to what people would expect? If that is the case, then my model still isn’t really in error, because when examining the statements I had no real reason to believe that they were meant to fool me.
2 and 5 struck me as common sense. I see reasons for 5 to be reversed now that I know the result [yeah...], but I still don’t understand why 2 is wrong. Not really the point of the question, but I do wonder...
I could not swallow the weather example either. Eventually, I looked it up in the article from Meyers: “Southerners were not more likely than Northerners to adjust to a tropical climate.”
It sounds like the Northerners were not addapting better, but, rather, there was no difference between groups. If so, the word “opposite” is not fair in this context.
In which case, TraderJoe and Rixie, good job at being appropriately confused!
Maybe because Southeners were used to hot weather and didn’t put any real effort into actively combatting the hot weather the way Northeners had to?
Same.
I found four of the findings surprising (because they were either non-obvious, or a bit strange/implausible—education generally makes you more resilient, and why would black people want to hang out with racists who learned the wrong handling lessons, and while being discriminated against makes you less confident to ask for a promotion, you are likely to want it more until you see this working out badly for group members), but I 100 % bought the Southerners dealing better with the heat, and am deeply baffled that they did not.
You’d expect them to have a better biological resistance through prior hardening, more awareness of the danger, and more importantly, more knowledge on what to do.
When we had a heat wave in Northern Europe, we had immense loss of life, despite the fact that such temperatures are regularly exceeded in other countries without such consequences—because people had no idea that heat was dangerous, or how to deal with it. They had no AC installed. They did not know whether to keep windows open or closed. They did not adjust their water and salt intake. They had no adequate clothing. They had an imperfect understanding of ventilation and shade. They did not recognise signs of heat stroke or low blood pressure. They weren’t concerned for babies and elderly people. They did not own sun screen. Their work hours were set to work through lunch time. Etc. etc. Even if I were more scared of the tropical heat as a Northener, I would still bet on Southeners doing much better.
Then again, maybe the high humidity turned it into an environment that acted differently than expected, so that the people learning about a new environment learned the right lessons, while those who thought it was familiar already were mal-adapted in some ways, so it evened out?
This prompted a memory of something I read in one of my undergrad psychology books a few years ago, which is probably referencing the same study, though using two different examples and one the same as the above example (though the phrasing is slightly different). Here is the extract:
Source:Pass, M. W. & Smith, R.E. (2007) Psychology:The Science of Mind and Behavior (Third Edition). McGraw HIll: Boston, pages 31-32
In hindsight, I guess I must have known that it would be a good idea to hang on to my undergrad textbooks. Or did I?
I smelled a rat immediately and decided to evaluate all five statements as if they had been randomly replaced with their opposites, or not. All five sounded wrong to me, I could think of rationalizations on each side but the rationalizations for the way they were actually presented sounded more forced.
I believed the first two, one out of personal experience and the other out of System 1. I guessed that as a soft, water-fat intellectual, I’d have more trouble adjusting to a military lifestyle than someone who’s actually been in a fight in his life. And that people from warmer climes deal with warmer temperatures more easily, well, I guess I believe people adapt to their circumstances. People from a warmer climate might sweat more and drink more water, or use less energy to generate less heat, whereas a man in Siberia might move more than is strictly necessary to keep his body temperature stable.
The other three are in subjects I know nothing about, and therefore I couldn’t have predicted them. A wise man knows his limits...
I’ve had a nagging sense of wrongness about #1, not so much about #5, which were the two that I knew the truth about.
While it might be true that intelectuals have trouble adapting to military lifestyle, actual combat is a whole different animal in that respect. It is also different from the type of fighting that goes on in typical civilian life.
Other than that, why would you assume that intelectuals wouldn’t be better predisposed to figguring out what they’re supposed to do to stay alive and accomplish the mission? Particularly as they’re more used to thinking than the average guy.
New here, so hoping for (a) an answer, even though it’s been a long time and (b) some mercy if I’m completely wrong… :)
Correct me if I’m wrong, but no theory based on known materials could predict what would happen to a completely new material with unknown qualities. If someone would design Kryptonite, which under the same conditions turns into water, this theory would completely fail to predict this.
Of course, you could update your theory to include Kryptonite, but it still would not include Zeptonium, which under the same conditions gives out gamma-rays.
Ridiculous, yes, but no more so than the conditions which would lead Eliezer to believe 2+2=3...
After getting miffed by your plethora of retractions, I figure that someone, at some point, left out some statistical significance values.
number 2 is the only one i’d currently be willing to bet on being correct-but now I’m thinking about how soldiers go through boot camp, weakening the effect, and maybe whoever set up the study forgot to make sure the observers didn’t know whether they were observing a southern soldier or a norther soldier....
I read a few of the sequences now (including this one), and started to find that:
A: They were very interesting to read. B: They seemed a bit obvious, like common sense.
This was somewhat confusing, as things which really are obvious are very familiar and predictable, and thus, not so interesting to experience.
It’s only my second time reading this (and first time reading the italics Exploring Social Psychology italics excerpt italics, that I’ve come to appreciate a link between sufficient explanation and illusion of obviousness.
So keep on dropping in hyper-links to previous sections like you do, they’re really helpful.
p.s. The help tables’ section on italics was not quite so good, as I’ve refrained from editing the result in order to demonstrate.
First time around (with hindsight bias) I answered F, T, F?, T, T
Second time around (without hindsight bias) I answered F, T, T?, T, F
Educated people are more likely to agree with authority, because they have spent more time being conditioned to obey teachers. I generalized that they might also take orders easier in the military, and conform to the circumstances easier, despite “rough and tumbled” blue collar stereotypes.
I thought be difficult to measure, but that there would be some small but probably measurable advantage.
I couldn’t really tell. At first my guess was based on the idea that repressed people are more motivated and fight harder. When I reversed my decision, it was because I put a higher weight on the situation being similar to women in business, where men are more likely to rock the boat and ask for a higher salary or promotions than women are. In retrospect, both of these are attempts to confirm a hypothesis, rather than to disprove it.
People often favor members of their own group. Unless a majority of southern blacks dislike a majority of southern whites, rather than just being indifferent, I hypothesized that they would relate to them more easily as fellow southerners.
Initially, I presumed what I thought was the simple and obvious answer, that people would avoid stress. After that, I recalled that adrenalin, deep bonds of sharing an experience, and a sense of purpose are play a big part, and that boredom may actually be a bigger factor since soldiers wouldn’t want to abandon their countrymen.
It’s hard to describe the different sensations of the two thought processes. I think it was harder to put in as much effort when I was actively suspending my disbelief. I was just going through the motions. The second time, I was really unsure, and took a deeper look. Or maybe I would have taken a deeper look if I had reexamined it under some other pretense.
I wasn’t sure whether the recession statement was true. I believe it is, but I’m not sure it would be for a full scale depression, since if people are earning less then they won’t be able to be more thrifty, because they will always need to eat and afford the basic necessities.
I “got” 2⁄5 of the above, before reading they were inverted.
When I took a psychology survey course in college, Dr. John Sabini gave a lot of attention to social psychology experiments, and much of the class was very surprised at their results; they didn’t say they “would have predicted them.” Of course Sabini may have been cherry-picking results that were likely to surprise. But I’ve seen it claimed elsewhere that social psychologists in the ’60s were largely preoccupied with producing results that would grab a lot of attention by being counterintuitive.
This hurts my image of Freud. Of course, after I have a dream about skyscrapers, he can explain that it’s connected to my love of my phallus, but could he predict my love of my phallus based on a dream about skyscrapers?
It seems to me that the paraphrasing in parentheses is also preying on the Conjunction Bias, by adding additional detail.
The link to Meyer’s excerpt has been dead for two years, here’s an archived link: https://web.archive.org/web/20170801042830/http://csml.som.ohio-state.edu:80/Music829C/hindsight.bias.html
Fixed, thanks!
Strongly believed the reverse on 1 and 4, and had very little belief either way on the rest. But it was enough that I began to suspect they were all false, perhaps also the big white space beneath it tipped off my subconscious to such a possibility. Can’t find the paper on sci-hub. What are the answers?
Interesting experience: I attempted to read the sequences ~10 years ago but kept getting sidetracked and put out of order by clicking all the links. This time, I decided to try again, but forced myself to read each post in order. All this to say that I read this post chronologically close to after reading “your strength as a rationalist”. I can’t evaluate how relevant this fact is, but I had alarm bells ringing in my head when reading the statements 3, 4, and 5. 4 especially was so incoherent to my model that I immediately thought there had to be a trick. Basically, my model:
could argue either side for 1 with equal probability
gave a >70% probability to 2
would have predicted <20% for 3
<5% for 4
<30% for 5
I’m not certain of how I did the first time I read this post, but I’m quite certain I didn’t do as well. So I’m wondering if I’ve gotten stronger or if it’s due to the reading order.
Misspelled.