Some criticisms of polling
POLLING 1. WHAT DO YOU INTEND?
As we approach the 3 November election, there are many polls promising to predict the outcomes. Looking back, many people were surprised by the outcome of the 2016 presidential. Late polls had given H Clinton a win probability over 80%. Were those polls in error? Not necessarily. An 80+% chance falls short of certainty. Still, the surprise result raises questions, and a lot of thought has gone into how polls arrive at a representative sample of the voting public. Something that gets a less attention is the method used to aggregate and present data. I think the usual methods are sloppy and misleading.
Suppose you were conducting a poll predicting the outcome of the upcoming election. The usual method is to go out and ask people their preferences, taking care to collect a representative cross section of voters, and applying well-reasoned principles weighing the chances of their voting. Then you report that candidate is expected to achieve % of the vote, etc. The result is continuous, and gives an impression of exactness. But the impression lies. In fact, this type of report loses much information by compressing data through a binary filter.
Imagine you were a candidate and wanted to know where to spend your limited time and resources. You might look to the poll figures and concentrate on big localities where the polls show nearly equal voter support. That is the best you could do with this kind of poll data. But it is a mediocre guide to resource allocation, and a poor way to predict which locality are likely to flip. Here is a better alternative: instead of just asking interviewees to name their preference, the pollster should measure the strength of the interviewee’s commitment.
Since we will eventually want to aggregate responses, we need a way to express strength of commitment according to some uniform scale. One idea could be to ask the interviewee to rank how resistant he would feel to changing his vote to the opposition candidate compared with his strength of allegiance going into previous elections. (This is an imperfect measure. I’d welcome suggestions for improvements). The pollster registers this “percentile allegiance score” and can later report on average and variance (alternatively: publish the distribution).
How does this help? Consider two localities. In locality A, one candidate has a fair-sized lead, but voter allegiance is weak, i.e. many voters are still open to changing their mind. While in locality B, support for the candidates are nearly evenly split, but there is a high allegiance score. According to the usual way of thinking, the candidates should concentrate on locality B. But attending to locality A might be the better decision.
POLLING 2: WHAT TO YOU THINK WILL HAPPEN?
Not all polls are about people’s intended actions. Some aim to assess public sentiment about the likelihood of upcoming events. Here as well, polls lose information by passing data through a binary filter.
Suppose a politician makes a claim that a corona virus vaccine will be approved for widespread use before 1 May 2021, and a pollster wants to measure public confidence in his statement. The usual way would be for pollsters to ask people whether they believe the statement, then report back that X% have confidence. But there is a much more insightful way to frame the question, namely: With what probability do you expect a corona virus vaccine to be approved for widespread use before 1 May 2021? The distribution of answers to this question gives a more reliable and robust picture of public confidence.
To see the difference, consider what would happen if everyone believed there was about a 50% chance of meeting the 1 May deadline. In this case, a small event, say a newspaper article reporting on the success or failure of some minor research investigation, could shift the public’s confidence ever so slightly. With the traditional data aggregation approach, this could produce a huge difference in the percentage of positive responses. The poll, which purports to reflect public confidence, dominated by a small and reversible event.
The novel polling policy is resistant to minor fluctuations in public outlook. The effect of the minor news story wouldn’t shift the aggregated average by more than a couple of percent, thus providing a more faithful report of belief in the politician’s statement.
REFLECTIONS
One thing I am starting to understand about polling: The most appropriate type of question and aggregating principle depend on the character of what is under investigation.
The type 2 poll works best in situations where there are just two distinctive choices. The method can be extended to cover multiple options, but when things start to get hairy it becomes increasingly difficult to define intuitive metrics.
For both question types, I am arguing for a change in approach and promising greater predictive power. It would be fair for someone challenge me with the question: ‘What makes you think so?’ My argument is plausible, but it is not definitive. For instance, it doesn’t answer quantitative questions: How much of an improvement?
Here’s a related problem: predicting the outcome of a sports contest, based not on polling, but on previous performances against many competitors. The only quantitative approach I’ve encountered to this is the Elo Rating System, widely used in chess. But when you take a close look at Elo, it turns out to be based on all sorts of heuristic assumptions (for instance, use of the logistics curve because the shape “looks right”) and pragmatic adjustments. In my private research, I’ve developed a family of prediction algorithms based on analogies between sporting events and various probabilistic contests. The obvious next step is to collect historical data and hold a contest between the algorithms. This would answer the question about quantitative determination.
The same thing should be done for my polling innovations: Conduct a series of parallel polls on the same question according to the traditional and new paradigms, then evaluate the results to assess the improvement in predictive power.
Can you clarify the meaning of the sentence, “late polls had given H Clinton a win probability over 80%”?
Polls don’t give win probabilities, they give a sample of people’s opinions, from an average of which you can perhaps derive a win probability in conjunction with a statistical model such as FiveThirtyEight’s (which gave Clinton a 71.4% chance on Election Day).
I don’t have good data to back this up, but I have a feeling that people are thinking in more binary terms than you expect. More specifically, I conjecture that if you were to ask someone for how likely meeting a certain politically charged event would be, they would parse your question as a binary one and answer either “almost certainly” or “very unlikely”—and when pressed for a number, would give you either between 90-100%, or 0-10% respectively.
Thanks, betulaster,
Since you address “how likely meeting a certain politically charged event would be”, I assume your question is focussed on what I’ve called “Polling 2″, which concerns itself with predicting future events. These tend to be less politically charged than than “Polling 1”, but I agree you are right in pointing out the need to relativise respondents answers. People who identify strongly with a cause, especially if they are not used to dealing with probability, might confuse a question about an event likelihood with the strength of their allegiance. Thus “How likely do you think the Dodgers are to win the World Series?” might be met with “I’d bet my life on it”, which is not very helpful for computing statistics :-)
The best way to put the matter into quantitative terms may be to ask the interviewee what odds he would give in a bet on the event occuring. It may seem redundant, but I would also ask the odds they’d give on a non-occurence. (People’s grasp on probability is shaky, so overdeterminining their perception helps to reduce error).
You will notice that for Polling 1 type questions I avoided the natural step of asking people to say how much money it would take to get them to change their mind. For one thing, it would be tasteless to appear to be offering money to get someone to change a vote (for instance). Another reason is that people’s perceptions of money vary widely, injecting a confounding variable. The rather convoluted question I came up with to assess an interviewee’s resistance to chance of intent has the disadvantage of generating a discreet (non-continuous) answer. and I worry it might also confuse some interviewees, but at least it makes a quantification in terms of a comparable quantity.
Yes, you’re right, and I should have been more clear—thanks for pointing that out.
I don’t know if I’m convinced that would work. I think that most people fall into two camps regarding betting odds. Camp A is not familiar with probability theory/calculus and doesn’t think in probabilistic terms in their daily life—they are only familiar with bets and odds as a rhetorical device (“the odds were stacked against him”, “against all odds”, etc). Camp B are people who bet/gamble as a pastime frequently, and are actually familiar with betting odds as an operable concept.
If you ask an A about their odds on an event related to a cause they feel strongly about, they will default to their understanding of odds as a rhetorical device and signal allegiance instead of giving you a usable estimate.
If you ask a B about their odds on the same event, they will start thinking about it in monetary terms, since habitual gamblers usually bet and gamble money. But, as you point out, putting a price on faith/belief/allegiance is seen as an immoral/dishonorable act, and would cause even more incentive to allegiance-signal instead of truly estimating probabilities.
In this way, this only works either for surveying people with good skills at rationality/probability, or for surveying people about events they don’t have strong feelings on.
However, there are two pitfalls to this argument, and that’s why I’m not stating this with complete certainty.
First, this is still speculation—I have no solid data on how familiar an aggregate (I’m not sure average is a good term to use here, given that mathematicians probably understand this concept very well while being relatively scarce) person is with concept of betting odds—and actually, do tell if you know any survey data that would allow to verify this, a gauge of how familiar people are with a certain concept seems like data useful enough to exist.
Second, this may be cultural. I’m not American and have never stayed in America long-term—and based on you mentioning baseball and the time of your reply I assume that you are—so potentially the concept of betting is somehow more ingrained into the culture there, and I’m just defaulting to my prior knowledge.
Yup. I didn’t see the point in highlighting that, since you mention yourself that the measure is imperfect, but this echoes my concern on betting. At the risk of sounding like an intellectualist snob, I think even the probabilistic concepts that most lesswrongers would see as basic are somewhat hard to imagine and operate with, save for as rhetorical devices, to the general public.