As we approach the 3 November election, there are many polls promising to predict the outcomes. Looking back, many people were surprised by the outcome of the 2016 presidential. Late polls had given H Clinton a win probability over 80%. Were those polls in error? Not necessarily. An 80+% chance falls short of certainty. Still, the surprise result raises questions, and a lot of thought has gone into how polls arrive at a representative sample of the voting public. Something that gets a less attention is the method used to aggregate and present data. I think the usual methods are sloppy and misleading.
Suppose you were conducting a poll predicting the outcome of the upcoming election. The usual method is to go out and ask people their preferences, taking care to collect a representative cross section of voters, and applying well-reasoned principles weighing the chances of their voting. Then you report that candidate X1is expected to achieve A1% of the vote, etc. The result is continuous, and gives an impression of exactness. But the impression lies. In fact, this type of report loses much information by compressing data through a binary filter.
Imagine you were a candidate and wanted to know where to spend your limited time and resources. You might look to the poll figures and concentrate on big localities where the polls show nearly equal voter support. That is the best you could do with this kind of poll data. But it is a mediocre guide to resource allocation, and a poor way to predict which locality are likely to flip. Here is a better alternative: instead of just asking interviewees to name their preference, the pollster should measure the strength of the interviewee’s commitment.
Since we will eventually want to aggregate responses, we need a way to express strength of commitment according to some uniform scale. One idea could be to ask the interviewee to rank how resistant he would feel to changing his vote to the opposition candidate compared with his strength of allegiance going into previous elections. (This is an imperfect measure. I’d welcome suggestions for improvements). The pollster registers this “percentile allegiance score” and can later report on average and variance (alternatively: publish the distribution).
How does this help? Consider two localities. In locality A, one candidate has a fair-sized lead, but voter allegiance is weak, i.e. many voters are still open to changing their mind. While in locality B, support for the candidates are nearly evenly split, but there is a high allegiance score. According to the usual way of thinking, the candidates should concentrate on locality B. But attending to locality A might be the better decision.
POLLING 2: WHAT TO YOU THINK WILL HAPPEN?
Not all polls are about people’s intended actions. Some aim to assess public sentiment about the likelihood of upcoming events. Here as well, polls lose information by passing data through a binary filter.
Suppose a politician makes a claim that a corona virus vaccine will be approved for widespread use before 1 May 2021, and a pollster wants to measure public confidence in his statement. The usual way would be for pollsters to ask people whether they believe the statement, then report back that X% have confidence. But there is a much more insightful way to frame the question, namely: With what probability do you expect a corona virus vaccine to be approved for widespread use before 1 May 2021? The distribution of answers to this question gives a more reliable and robust picture of public confidence.
To see the difference, consider what would happen if everyone believed there was about a 50% chance of meeting the 1 May deadline. In this case, a small event, say a newspaper article reporting on the success or failure of some minor research investigation, could shift the public’s confidence ever so slightly. With the traditional data aggregation approach, this could produce a huge difference in the percentage of positive responses. The poll, which purports to reflect public confidence, dominated by a small and reversible event.
The novel polling policy is resistant to minor fluctuations in public outlook. The effect of the minor news story wouldn’t shift the aggregated average by more than a couple of percent, thus providing a more faithful report of belief in the politician’s statement.
REFLECTIONS
One thing I am starting to understand about polling: The most appropriate type of question and aggregating principle depend on the character of what is under investigation.
The type 2 poll works best in situations where there are just two distinctive choices. The method can be extended to cover multiple options, but when things start to get hairy it becomes increasingly difficult to define intuitive metrics.
For both question types, I am arguing for a change in approach and promising greater predictive power. It would be fair for someone challenge me with the question: ‘What makes you think so?’ My argument is plausible, but it is not definitive. For instance, it doesn’t answer quantitative questions: How much of an improvement?
Here’s a related problem: predicting the outcome of a sports contest, based not on polling, but on previous performances against many competitors. The only quantitative approach I’ve encountered to this is the Elo Rating System, widely used in chess. But when you take a close look at Elo, it turns out to be based on all sorts of heuristic assumptions (for instance, use of the logistics curve because the shape “looks right”) and pragmatic adjustments. In my private research, I’ve developed a family of prediction algorithms based on analogies between sporting events and various probabilistic contests. The obvious next step is to collect historical data and hold a contest between the algorithms. This would answer the question about quantitative determination.
The same thing should be done for my polling innovations: Conduct a series of parallel polls on the same question according to the traditional and new paradigms, then evaluate the results to assess the improvement in predictive power.
Some criticisms of polling
POLLING 1. WHAT DO YOU INTEND?
As we approach the 3 November election, there are many polls promising to predict the outcomes. Looking back, many people were surprised by the outcome of the 2016 presidential. Late polls had given H Clinton a win probability over 80%. Were those polls in error? Not necessarily. An 80+% chance falls short of certainty. Still, the surprise result raises questions, and a lot of thought has gone into how polls arrive at a representative sample of the voting public. Something that gets a less attention is the method used to aggregate and present data. I think the usual methods are sloppy and misleading.
Suppose you were conducting a poll predicting the outcome of the upcoming election. The usual method is to go out and ask people their preferences, taking care to collect a representative cross section of voters, and applying well-reasoned principles weighing the chances of their voting. Then you report that candidate X1 is expected to achieve A1% of the vote, etc. The result is continuous, and gives an impression of exactness. But the impression lies. In fact, this type of report loses much information by compressing data through a binary filter.
Imagine you were a candidate and wanted to know where to spend your limited time and resources. You might look to the poll figures and concentrate on big localities where the polls show nearly equal voter support. That is the best you could do with this kind of poll data. But it is a mediocre guide to resource allocation, and a poor way to predict which locality are likely to flip. Here is a better alternative: instead of just asking interviewees to name their preference, the pollster should measure the strength of the interviewee’s commitment.
Since we will eventually want to aggregate responses, we need a way to express strength of commitment according to some uniform scale. One idea could be to ask the interviewee to rank how resistant he would feel to changing his vote to the opposition candidate compared with his strength of allegiance going into previous elections. (This is an imperfect measure. I’d welcome suggestions for improvements). The pollster registers this “percentile allegiance score” and can later report on average and variance (alternatively: publish the distribution).
How does this help? Consider two localities. In locality A, one candidate has a fair-sized lead, but voter allegiance is weak, i.e. many voters are still open to changing their mind. While in locality B, support for the candidates are nearly evenly split, but there is a high allegiance score. According to the usual way of thinking, the candidates should concentrate on locality B. But attending to locality A might be the better decision.
POLLING 2: WHAT TO YOU THINK WILL HAPPEN?
Not all polls are about people’s intended actions. Some aim to assess public sentiment about the likelihood of upcoming events. Here as well, polls lose information by passing data through a binary filter.
Suppose a politician makes a claim that a corona virus vaccine will be approved for widespread use before 1 May 2021, and a pollster wants to measure public confidence in his statement. The usual way would be for pollsters to ask people whether they believe the statement, then report back that X% have confidence. But there is a much more insightful way to frame the question, namely: With what probability do you expect a corona virus vaccine to be approved for widespread use before 1 May 2021? The distribution of answers to this question gives a more reliable and robust picture of public confidence.
To see the difference, consider what would happen if everyone believed there was about a 50% chance of meeting the 1 May deadline. In this case, a small event, say a newspaper article reporting on the success or failure of some minor research investigation, could shift the public’s confidence ever so slightly. With the traditional data aggregation approach, this could produce a huge difference in the percentage of positive responses. The poll, which purports to reflect public confidence, dominated by a small and reversible event.
The novel polling policy is resistant to minor fluctuations in public outlook. The effect of the minor news story wouldn’t shift the aggregated average by more than a couple of percent, thus providing a more faithful report of belief in the politician’s statement.
REFLECTIONS
One thing I am starting to understand about polling: The most appropriate type of question and aggregating principle depend on the character of what is under investigation.
The type 2 poll works best in situations where there are just two distinctive choices. The method can be extended to cover multiple options, but when things start to get hairy it becomes increasingly difficult to define intuitive metrics.
For both question types, I am arguing for a change in approach and promising greater predictive power. It would be fair for someone challenge me with the question: ‘What makes you think so?’ My argument is plausible, but it is not definitive. For instance, it doesn’t answer quantitative questions: How much of an improvement?
Here’s a related problem: predicting the outcome of a sports contest, based not on polling, but on previous performances against many competitors. The only quantitative approach I’ve encountered to this is the Elo Rating System, widely used in chess. But when you take a close look at Elo, it turns out to be based on all sorts of heuristic assumptions (for instance, use of the logistics curve because the shape “looks right”) and pragmatic adjustments. In my private research, I’ve developed a family of prediction algorithms based on analogies between sporting events and various probabilistic contests. The obvious next step is to collect historical data and hold a contest between the algorithms. This would answer the question about quantitative determination.
The same thing should be done for my polling innovations: Conduct a series of parallel polls on the same question according to the traditional and new paradigms, then evaluate the results to assess the improvement in predictive power.