It is now September of an election year. Get ready for a two-month deluge of polls and commentary on them. One thing you can count on is reporters and pundits misinterpreting the meaning behind “margin of error.” This post is meant to simplify the concept.

Margin of error refers to sampling error and is present on every poll or market research survey. It can be mathematically calculated. All polls seek to figure out what everybody thinks by asking a small sample of people. There is always some degree of error in this.

The formula for margin of error is fairly simple and depends mostly on two things: how many people are surveyed and their variability of response. The more people you interview, the lower (better) the margin of error. The more the people you interview give the same response (lower variability), the better the margin of error. If a poll interviews a lot of people and they all seem to be saying the same thing, the margin of error of the poll is low. If the poll interviews a small number of people and they disagree a lot, the margin of error is high.

Most reporters understand that a poll with a lot of respondents is better than one with fewer respondents. But most don’t understand the variability component.

*There is another assumption used in the calculation for sampling error as well: the confidence level desired. Almost every pollster will use a 95% confidence level, so for this explanation we don’t have to worry too much about that.*

What does it mean to be within the margin of error on a poll? It simply means that the two percentages being compared can be deemed different from one another with 95% confidence. Put another way, if the poll was repeated a zillion times, we’d expect that at least 19 out of 20 times the two numbers would be different.

If Biden is leading Trump in a poll by 8 points and the margin of error is 5 points, we can be confident he is really ahead because this lead is outside the margin of error. Not perfectly confident, but more than 95% confident.

Here is where reporters and pundits mess it up. Say they are reporting on a poll with a 5-point margin of error and Biden is leading Trump by 4 points. Because this lead is within the margin of error, they will often call it a “statistical dead heat” or say something that implies that the race is tied.

Neither is true. The only way for a poll to have a statistical dead heat is for the exact same number of people to choose each candidate. In this example the race isn’t tied at all, we just have a less than 95% confidence that Biden is leading. In this example, we might be 90% sure that Biden is leading Trump. So, why would anyone call that a statistical dead heat? It would be way better to be reporting the level of confidence that we have that Biden is winning, or the p-value of the result. I have never seen a reporter do that, but some of the election prediction websites do.

Pollsters themselves will misinterpret the concept. They will deem their poll “accurate” as long as the election result is within the margin of error. In close elections this isn’t helpful, as what really matters is making a correct prediction of what will happen.

Most of the 2016 final polls were accurate if you define being accurate as coming within the margin of error. But, since almost all of them predicted the wrong winner, I don’t think we will see future textbooks holding 2016 out there as a zenith of polling accuracy.

Another mistake reporters (and researchers make) is not recognizing that the margin of error only refers to sampling error which is just one of many errors that can occur on a poll. The poor performance of the 2016 presidential polls really had nothing to do with sampling error at all.

I’ve always questioned why there is so much emphasis on sampling error for a couple of reasons. First, the calculation of sampling error assumes you are working with a random sample which in today’s polling world is almost never the case. Second, there are many other types of errors in survey research that are likely more relevant to a poll’s accuracy than sampling error. The focus on sampling error is driven largely because it is the easiest error to mathematically calculate. Margin of error is useful to consider, but needs to be put in context of all the other types of errors that can happen in a poll.

Excellent!