Archive for January, 2016

How can you predict an election by interviewing only 400 people?

This might be the most commonly asked question researchers get at cocktail parties (to the extent that researchers go to cocktail parties). It is also a commonly unasked question among researchers themselves: how can we predict an election by only talking to 400 people? 

The short answer is we can’t. We can never predict anything with 100% certainty from a research study or poll. The only way we could predict the election with 100% certainty would be to interview every person who will end up voting. Even then, since people might change their mind between the poll and the election we couldn’t say our prediction was 100% likely to come true.

To provide an example, if I want to flip a coin 100 times, my best estimate before I do it would be that I will get “heads” 50 times. But, it isn’t 100% certain the coin will land on heads 50 times.

The reason it is hard to comprehend how we predict elections by talking to so few people is our brains aren’t trained to understand probability. If we interview 400 people and find that 53% will vote for Hillary Clinton and 47% for Donald Trump, as long as the poll was conducted well, this result becomes our best prediction for what the vote will be. It is similar to predicting we will get 50 heads out of 100 coin tosses.  53% is our best prediction given the information we have. But, it isn’t an infallible prediction.

Pollsters provide a sampling error, which is +/-5% in this case. 400 is a bit of a magic number. It results in a maximum possible sampling error of +/-5% which has long been an acceptable standard. (Actually, we need 384 interviews for that, but researchers will use 400 instead because it sounds better.)

What that means is that if we repeated this poll over and over, we would expect to find Clinton to receive between 48% and 58% of the intended vote, 95% of the time. We’d expect Trump to receive between 42% and 52% of the intended vote, 95% of the time. On average though, if we kept doing poll after poll, our best guess would be if we averaged Clinton’s result it would be 53%.

In the coin flipping example, if we repeatedly flipped the coin 400 times, we should get between 45% and 55% heads 95% of the time. But, our average and most common result will be 50% heads.

Because the ranges of the election poll (48%-58% for Clinton and 42%-52% for Trump) overlap, you will often see reporters (and the candidate that is in second place) say that the poll is a “statistical dead heat.” There is no such thing as a statistical dead heat in polling unless the exact number of respondents prefer each candidate, which may never have actually happened in the history of polling.

There is a much better way to report the findings of the poll. We can statistically determine the “odds” that the 53% for Clinton is actually higher than the 47% for Trump. If we repeated the poll many times, what is the probability that the percentage we found for Clinton would be higher than what we found for Trump? In other words, what is the probability that Clinton is going to win?

The answer in this case is 91%.  Based on our example poll, Clinton has a 91% chance of winning the election. Say that instead of 400 people we interviewed 1,000. The same finding would imply that Clinton has a 99% chance of winning. This is a much more powerful and interesting way to report polling results, and we are surprised we have never seen a news organization use polling data in this way.

Returning to our coin flipping example, if we flip a coin 400 times and get heads 53% of the time, there is a 91% chance that we have a coin that is unfair, and biased towards heads. If we did it 1,000 times and got heads 53% of the time, there would be a 99% chance that the coin is unfair. Of course, a poll is a snapshot in time. The closer it is to the election, the more likely it is that the numbers will not change.  And, polling predictions assume many things that are rarely true:  that we have a perfect random sample, that all subgroups respond at the same rate, that questions are clear, that people won’t change their mind on Election Day, etc.

So, I guess the correct answer to “how can we predict the election from surveying 400 people” is “we can’t, but we can make a pretty good guess.”