[Link] Algorithm aversion
It has long been known that algorithms out-perform human experts on a range of topics (here’s a LW post on this by lukeprog). Why, then, is it that people continue to mistrust algorithms, in spite of their superiority, and instead cling to human advice? A recent paper by Dietvorst, Simmons and Massey suggests it is due to a cognitive bias which they call algorithm aversion. We judge less-than-perfect algorithms more harshly than less-than-perfect humans. They argue that since this aversion leads to poorer decisions, it is very costly, and that we therefore must find ways of combating it.
Abstract:
Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.
General discussion:
The results of five studies show that seeing algorithms err makes people less confident in them and less likely to choose them over an inferior human forecaster. This effect was evident in two distinct domains of judgment, including one in which the human forecasters produced nearly twice as much error as the algorithm. It arose regardless of whether the participant was choosing between the algorithm and her own forecasts or between the algorithm and the forecasts of a different participant. And it even arose among the (vast majority of) participants who saw the algorithm outperform the human forecaster.
The aversion to algorithms is costly, not only for the participants in our studies who lost money when they chose not to tie their bonuses to the algorithm, but for society at large. Many decisions require a forecast, and algorithms are almost always better forecasters than humans (Dawes, 1979; Grove et al., 2000; Meehl, 1954). The ubiquity of computers and the growth of the “Big Data” movement (Davenport & Harris, 2007) have encouraged the growth of algorithms but many remain resistant to using them. Our studies show that this resistance at least partially arises from greater intolerance for error from algorithms than from humans. People are more likely to abandon an algorithm than a human judge for making the same mistake. This is enormously problematic, as it is a barrier to adopting superior approaches to a wide range of important tasks. It means, for example, that people will more likely forgive an admissions committee than an admissions algorithm for making an error, even when, on average, the algorithm makes fewer such errors. In short, whenever prediction errors are likely—as they are in virtually all forecasting tasks—people will be biased against algorithms.
More optimistically, our findings do suggest that people will be much more willing to use algorithms when they do not see algorithms err, as will be the case when errors are unseen, the algorithm is unseen (as it often is for patients in doctors’ offices), or when predictions are nearly perfect. The 2012 U.S. presidential election season saw people embracing a perfectly performing algorithm. Nate Silver’s New York Times blog, Five Thirty Eight: Nate Silver’s Political Calculus, presented an algorithm for forecasting that election. Though the site had its critics before the votes were in— one Washington Post writer criticized Silver for “doing little more than weighting and aggregating state polls and combining them with various historical assumptions to project a future outcome with exaggerated, attention-grabbing exactitude” (Gerson, 2012, para. 2)—those critics were soon silenced: Silver’s model correctly predicted the presidential election results in all 50 states. Live on MSNBC, Rachel Maddow proclaimed, “You know who won the election tonight? Nate Silver,” (Noveck, 2012, para. 21), and headlines like “Nate Silver Gets a Big Boost From the Election” (Isidore, 2012) and “How Nate Silver Won the 2012 Presidential Election” (Clark, 2012) followed. Many journalists and popular bloggers declared Silver’s success a great boost for Big Data and statistical prediction (Honan, 2012; McDermott, 2012; Taylor, 2012; Tiku, 2012).
However, we worry that this is not such a generalizable victory. People may rally around an algorithm touted as perfect, but we doubt that this enthusiasm will generalize to algorithms that are shown to be less perfect, as they inevitably will be much of the time.
Why is it that opium puts people to sleep? A recent paper by Molière suggests it is due to a property which he calls its dormitive principle.
Yes, the contribution here isn’t explaining it, it’s demonstrating it and naming it.
Haha yes that did strike me too. However, I suppose there could have been other explanations of people’s unwillingness to trust algorithms than a cognitive bias of this sort. For instance, the explanation could have been that experts conspire to fool people that they are in fact better than the algorithms. The fact that people mistrust algorithms even in this case, where there clearly wasn’t an expert conspiracy going on, suggests that that probably isn’t the explanation.
For some background on /u/RichardKennaway’s point, see:
Mysterious Answers to Mysterious Questions
Correspondence Bias
My intuition was that it was fear of edge cases, even when the person couldn’t articulate exactly what an edge case is.
I would loosely model my own aversion to trusting algorithms as follows: Both human and algorithmic forecasters will have blind spots, not all of them overlapping. (I.e. there will be cases “obvious” to each which the other gets wrong.) We’ve been dealing with human blind spots for the entire history of civilization, and we’re accustomed to them. Algorithmic blindspots, on the other hand, are terrifying: When an algorithm makes a decision that harms you, and the decision is—to any human—obviously stupid, the resulting situation would best be described as ‘Kafkaesque’.
I suppose there’s another psychological factor at work here, too: When an algorithm makes an “obviously wrong” decision, we feel helpless. By contrast, when a human does it, there’s someone to be angry at. That doesn’t make us any less helpless, but it makes us FEEL less so. (This makes me think of http://lesswrong.com/lw/jad/attempted_telekinesis/ .)
But wait! If many of the algorithm’s mistakes are obvious to any human with some common sense, then there is probably a process of algorithm+sanity check by a human, which will outperform even the algorithm. In which case, you yourself can volunteer for the sanity check role, and this should make you even more eager to use the algorithm.
(Yes, I’m vaguely aware of some research which shows that “sanity check by a human” often makes things worse. But let’s just suppose.)
I do think an algorithm-supported-human approach will probably beat at least an unassisted human, and I think a lot of people would be more comfortable with it than algorithm-alone. (As long as the final discretion belongs to a human, the worst fears are ameliorated.)
Real world example.
Years ago, I worked at a company which made a machine to screen pap smear slides. Granted, that much of the insanity in health care is about regulatory power, law, and money, but even so, people were just weird about algorithmic screening.
The machine was much more accurate than the great mass of labs in the country.
But no matter how accurate automated screening was when compared to manual screening, there was always such a tizzy about any algorithmic faults. The fact that manual screening produces many more such faults was simply glossed over. Human faults were invisible and accepted, machine faults were a catastrophe.
From that Future of Life conference: if self-driving cars take over and cut the death rate from car accidents from 32000 to 16000 per year, the makers won’t get 16000 thank-you cards—they’ll get 16000 lawsuits.
Here’s an article in Harvard Business Review about algorithm aversion:
My emphasis.
The authors also have a forthcoming paper on this issue:
Presumably another bias, the IKEA effect, which says that people prefer products they’ve partially created themselves, is at play here.
Possible counterexample:
My father is a professor of electrical engineering. The electronics lab courses involve using simulation software as well as using physical components. In one lab experiment, the students built a circuit that the software didn’t simulate correctly (because of simplifications in the models the software used), and one of the questions the students had to answer was why they thought the computer simulation didn’t match the measured values. All the students blamed experimental error, and none questioned the computer models...
Probably because humans who don’t know much about algorithms basically have no way to observe or verify the procedure. The result of an algorithm has all the force of an appeal to authority, and we’re far more comfortable granting authority to humans.
I think people have also had plenty of experience with machines that malfunction and have objections on those grounds. We can tell when a human goes crazy if his arguments turn into gibberish, but it’s a bit harder to do with computers. If an algorithm outputs gibberish that’s one thing, but there are cases when the algorithm produces a seemingly reasonable number that ends up being completely false.
It’s a question of whether to trust a transparent process with a higher risk of error or a black box with a lower, but still non-negligible risk of error.
I’m not sure that explains why they judge the algorithm’s mistakes more harshly even after seeing the algorithm perform better. If you hadn’t seen the algorithm perform and didn’t know it had been rigorously tested, you could justify being skeptical about how it works, but seeing its performance should answer that. Besides, a human’s “expert judgment” on a subject you know little about is just as much of a black box.
If people see you as an authority and you make a mistake, they can accept that no one is perfect and mistakes happen. If they doubt the legitimacy of your authority, any mistakes will be taken as evidence of hubris and incompetence.
I think part of it is the general population just not being used to algorithms on a conceptual level. One can understand the methods used and so accept the algorithm, or one can get used to such algorithms over a period of time and come to accept them.
And such experts are routinely denounced by people who know little about the subject in question. I leave examples as an exercise for the reader.
True, but that seems inconsistent with taking human experts but not algorithms as authorities. Maybe these tend to be different people, or they’re just inconsistent about judging human experts.
It’s worth thinking about what makes one an expert, and what convinces others of one’s expertise. Someone has to agree that you’re an expert before they take you as an authority. There’s a social dynamic at work here.
Like other commenters already pointed out, algorithms are scary because they always fail hard. Humans fail, but can recover. Hofstadter’s terms seem useful here. Humans can notice or be made to notice that something isn’t right and jump out of the system of the basic diagnostic procedure. All the algorithms we currently have are sphexish and will forever remain in their initial framework even when things go wrong.
Yes, that’s the point.
(I think sphexish is Dawkins, not Hofstadter.)
Hofstadter uses it heavily in Gödel, Escher, Bach in 1979 as the metaphor for things that are unable to Jump Out Of The System. Dawkins only had The Selfish Gene out by then, and The Selfish Gene wasn’t really about algorithmic rigidity.
Oops, you’re right
If people trust human forecasters over machine forecasters, but the machine forecasts are better, just use the machine secretly, and take all the credit.
Indeed. That is precisely what the so-called “closet index funds” are doing. They are said to be actively managed funds, but are in reality so-called index trackers, which just are tracking the stock market index.
The reason the managers of the fund are using index-tracking algorithms rather than human experts is, however, not so much that the former are better (as I understand they are roughly on par) but that they are much cheaper. People think that the extra costs that active management brings with it are worth it, however, since they erroneously believe that human experts can consistently beat the index.
Maybe human experts tend to track the index anyway?
My intuition was that participants assumed the human forecasters could use some knowledge the model couldn’t, but that is definitely not the case in the xperiments. The model and the forecaster have exactly the same data and that is made clear in the setup. It is really that participants prefer the human despite ‘better knowledge’ - esp. if the comparison is explicit:
Page 6, emphasis mine.
I admit that I am surprised and I do not understand the cause of this algorithm aversion.
I wonder if this (distrusting imperfect algorithms more than imperfect people) holds for programmers and mathematicians. Indeed, the popular perception seems to be that such folks overly trust algorithms...
I was under the impression that mathematicians are actually too distrusting of imperfect algorithms (compared to their actual error rates). The three examples I ran into myself were:
In analysis, in particular in bifurcation analysis, a (small) parameter epsilon is introduced which determine the size of the perturbation. Analysts always loudly proclaim that ‘there exists an epsilon small enough’ such that their analysis holds (example values are often around 1/1000), but frequently the techniques are valid for values as large as epsilon = 1⁄2 (for example). Analysist who are unwilling to make statements about such large values of epsilon seem to be too mistrusting of their own techniques/algorithms.
Whether or not pi and e are normal are open questions in mathematics, but statistical analysis of the first couple of billion of digits (if I am not mistaken) suggests that pi might be normal whereas e is probably not. Still, many mathematicians seem to be agnostic about these questions, as only a few billion data points have been obtained.
In the study of number fields probabilistic algorithms are implemented to compute certain interesting properties such as the class group (algorithms that are guaranteed to give the right answer exist, but are too slow to be used in anything other than a few test cases). These algorithms generally have a guaranteed error rate of about 0.01% (sometimes this is a tune-able parameter), but I know of a few mathematicians in this field (which makes it a high percentage, since I only know a few mathematicians in this field) who will frequently doubt the outcome of such an algorithm.
Of course these are only my personal experiences, but I’d guess that mathematicians are on the whole too fond of certainty and trust imperfect algorithms too little rather than too much.
Are algorithms easier to exploit than humans? Consider the case of playing a video game against a deterministic AI opponent; if the AI makes a stupid mistake once, it’ll make the same stupid mistake over and over, and an AI with a known exploit is easy to beat regardless of how well it plays in “normal” circumstances.
Sure, and when algorithms are used in adversarial situations, such as on the stock market, they usually have humans standing guard ready to hit the red button. But most situations are not adversarial, eg, medical diagnosis.
I think it is the issue with moral responsibility, same way as with self-driving cars. People don’t want a decision who may negatively affect a person be based on an algorithm because an algorithm is not a moral agent. They want some expert to stand up and say “I risk my professional prestige and blame and all that and declare this person to be prone to violence and thus needs to have a restraining order issued, and if I am wrong it is my bad”. As my dad used to say it is all about who is willing to put their dick in the cigar cutter? To accept responsibility, blame, even punishment for a decision made that affects others?
Part of it is rational: decision makers having skin in the game makes decisions better. See Taleb, Anti-Fragile.
Part of it is simply we being used to or evolved to thinking without responsibility there cannot be good decisions, which is true as long as humans make them. We are not evolved to deal with algorithms.
To the blind spots/edge cases objection; the article about SPRs linked in the post shows that even when human experts are given the results of an algorithm and asked to fix their errors, they still do worse than just the algorithm on it’s own.
Because even though the human may recognize an obvious mistake, they will also recognize ten good predictions as “mistakes”.