Archive for November, 2020

Oops, the polls did it again

Many people had trouble sleeping last night wondering if their candidate was going to be President. I couldn’t sleep because as the night wore on it was becoming clear that this wasn’t going to be a good night for the polls.

Four years ago on the day after the election I wrote about the “epic fail” of the 2016 polls. I couldn’t sleep last night because I realized I was going to have to write another post about another polling failure. While the final vote totals may not be in for some time, it is clear that the 2020 polls are going to be off on the national vote even more than the 2016 polls were.

Yesterday, on election day I received an email from a fellow market researcher and business owner. We are involved in a project together and he was lamenting how poor the data quality has been in his studies recently and was wondering if we were having the same problems.

In 2014 we wrote a blog post that cautioned our clients that we were detecting poor quality interviews that needed to be discarded about 10% of the time. We were having to throw away about 1 in 10 of the interviews we collected.

Six years later that percentage has moved to be between 33% and 45% and we tend to be conservative in the interviews we toss. It is fair to say that for most market research studies today, between a third and a half of the interviews being collected are, for a lack of a better term, junk.  

It has gotten so bad that new firms have sprung up that serve as a go-between from sample providers and online questionnaires in order to protect against junk interviews. They protect against bots, survey farms, duplicate interviews, etc. Just the fact that these firms and terms like “survey farms” exist should give researchers pause regarding data quality.

When I started in market research in the late 80s/early 90’s we had a spreadsheet program that was used to help us cost out projects. One parameter in this spreadsheet was “refusal rate” – the percent of respondents who would outright refuse to take part in a study. While the refusal rate varied by study, the beginning assumption in this program was 40%, meaning that on average we expected 60% of the time respondents would cooperate. 

According to Pew and AAPOR in 2018 the cooperation rate for telephone surveys was 6% and falling rapidly.

Cooperation rates in online surveys are much harder to calculate in a standardized way, but most estimates I have seen and my own experience suggest that typical cooperation rates are about 5%. That means for a 1,000-respondent study, at least 20,000 emails are sent, which is about four times the population of the town I live in.

This is all background to try to explain why the 2020 polls appear to be headed to a historic failure. Election polls are the public face of the market research industry. Relative to most research projects, they are very simple. The problems pollsters have faced in the last few cycles is emblematic of something those working in research know but rarely like to discuss: the quality of data collected for research and polls has been declining, and should be alarming to researchers.

I could go on about the causes of this. We’ve tortured our respondents for a long time. Despite claims to the contrary, we haven’t been able to generate anything close to a probability sample in years. Our methodologists have gotten cocky and feel like they can weight any sampling anomalies away. Clients are forcing us to conduct projects on timelines that make it impossible to guard against poor quality data. We focus on sampling error and ignore more consequential errors. The panels we use have become inbred and gather the same respondents across sources. Suppliers are happy to cash the check and move on to the next project.

This is the research conundrum of our times: in a world where we collect more data on people’s behavior and attitudes than ever before, the quality of the insights we glean from these data is in decline.

Post 2016 the polling industry brain trust rationalized and claimed that the polls actually did a good job, convened some conferences to discuss the polls, and made modest methodological changes. Almost all of these changes related to sampling and weighting. But, as it appears that the 2020 polling miss is going to be way beyond what can be explained by sampling (last night I remarked to my wife that “I bet the p-value of this being due to sampling is about 1 in 1,000”), I feel that pollsters have addressed the wrong problem.

None of the changes pollsters made addressed the long-term problems researchers face with data quality. When you have a response rate of 5% and up to half of those are interviews you need to throw away, errors that can arise are orders of magnitude greater than the errors that are generated by sampling and weighting mistakes.

I don’t want to sound like I have the answers.  Just a few days ago I posted that I thought that on balance there were more reasons to conclude that the polls would do a good job this time than to conclude that they would fail. When I look through my list of potential reasons the polls might fail, nothing leaps to me as an obvious cause, so perhaps the problem is multi-faceted.

What I do know is the market research industry has not done enough to address data quality issues. And every four years the polls seem to bring that into full view.

Will the polls be right this time?

The 2016 election was damaging to the market research industry. The popular perception has been that in 2016 the pollsters missed the mark and miscalled the winner. In reality, the 2016 polls were largely predictive of the national popular vote. But, 2016 was largely seen by non-researchers as disastrous. Pollsters and market researchers have a lot riding on the perceived accuracy of 2020 polls.

The 2016 polls did a good job of predicting the national vote total but in a large majority of cases final national polls were off in the direction of overpredicting the vote for Clinton and underpredicting the vote for Trump. That is pretty much a textbook definition of bias. Before the books are closed on the 2016 pollster’s performance, it is important to note that the 2012 polls were off even further and mostly in the direction of overpredicting the vote for Romney and underpredicting the vote for Obama. The “bias,” although small, has swung back and forth between parties.

Election Day 2020 is in a few days and we may not know the final results for a while. It won’t be possible to truly know how the polls did for some weeks or months.

That said, there are reasons to believe that the 2020 polls will do an excellent job of predicting voter behavior and there are reasons to believe they may miss the mark.  

There are specific reasons why it is reasonable to expect that the 2020 polls will be accurate. So, what is different in 2020? 

  • There have been fewer undecided voters at all stages of the process. Most voters have had their minds made up well in advance of election Tuesday. This makes things simpler from a pollster’s perspective. A polarized and engaged electorate is one whose behavior is predictable. Figuring out how to partition undecided voters moves polling more in a direction of “art” than “science.”
  • Perhaps because of this, polls have been remarkably stable for months. In 2016, there was movement in the polls throughout and particularly over the last two weeks of the campaign. This time, the polls look about like they did weeks and even months ago.
  • Turnout will be very high. The art in polling is in predicting who will turn out and a high turnout election is much easier to forecast than a low turnout election.
  • There has been considerable early voting. There is always less error in asking about what someone has recently done than what they intend to do in the future. Later polls could ask many respondents how they voted instead of how they intended to vote.
  • There have been more polls this time. As our sample size of polls increases so does the accuracy. Of course, there are also more bad polls out there this cycle as well.
  • There have been more and better polls in the swing states this time. The true problem pollsters had in 2016 was with state-level polls. There was less attention paid to them, and because the national pollsters and media didn’t invest much in them, the state-level polling is where it all went wrong. This time, there has been more investment in swing-state polling.
  • The media invested more in polls this time. A hidden secret in polling is that election polls rarely make money for the pollster. This keeps many excellent research organizations from getting involved in them or dedicating resources to them. The ones that do tend to do so solely for reputational reasons. An increased investment this time has helped to get more researchers involved in election polling.
  • Response rates are upslightly. 2020 is the first year where we have seen a long-term trend towards declining response rates on survey stabilize and even kick up a little. This is likely a minor factor in the success of the 2020 polls, but it is in the right direction.
  • The race isn’t as close as it was in 2016. This one might only be appreciated by statisticians. Since variability is maximized in a 50/50 distribution the further away from an even race it is the more accurate a poll will be. This is another small factor in the direction of the polls being accurate in 2020.
  • There has not been late breaking news that could influence voter behavior. In 2016, the FBI director’s decision to announce a probe into Clinton’s emails came late in the campaign. There haven’t been any similar bombshells this time.
  • Pollsters started setting quotas and weighting on education. In the past, pollsters would balance samples on characteristics known to correlate highly with voting behavior – characteristics like age, gender, political party affiliation, race/ethnicity, and past voting behavior. In 2016, pollsters learned the hard way that educational attainment had become an additional characteristic to consider when crafting samples because voter preferences vary by education level. The good polls fixed that this go round.
  • In a similar vein, there has been a tighter scrutiny of polling methodology. While the media can still be a cavalier about digging into methodology, this time they were more likely to insist that pollsters outline their methods. This is the first time I can remember seeing news stories where pollsters were asked questions about methodology.
  • The notion that there are Trump supporters who intentionally lie to pollsters has largely been disproven by studies from very credible sources, such as Yale and Pew. Much more relevant is the pollster’s ability to predict turnout from both sides.

There are a few things going on that give the polls some potential to lay an egg.

  • The election will be decided by a small number of swing states. Swing state polls are not as accurate and are often funded by local media and universities that don’t have the funding or the expertise to do them correctly. The polls are close and less stable in these states. There is some indication that swing state polls have been tightening, and Biden’s lead in many of them isn’t much different than Clinton’s lead in 2020.
  • Biden may be making the same mistake Clinton made. This is a political and not a research-related reason, but in 2016 Clinton failed to aggressively campaign in the key states late in the campaign while Trump went all in. History could be repeating itself. Field work for final polls is largely over now, so the polls will not reflect things that happen the last few days.
  • If there is a wild-card that will affect polling accuracy in 2020, it is likely to center around how people are voting. Pollsters have been predicting election day voting for decades. In this cycle votes have been coming in for weeks and the methods and rules around early voting vary widely by state. Pollsters just don’t have past experience with early voting.
  • There is really no way for pollsters to account for potential disqualifications for mail-in votes (improper signatures, late receipts, legal challenges, etc.) that may skew to one candidate or another.
  • Similarly, any systematic voter suppression would likely cause the polls to underpredict Trump. These voters are available to poll, but may not be able to cast a valid vote.
  • There has been little mention of third-party candidates in polling results. The Libertarian candidate is on the ballot in all 50 states. The Green Party candidate is on the ballot in 31 states. Other parties have candidates on the ballot in some states but not others. These candidates aren’t expected to garner a lot of votes, but in a close election even a few percentage points could matter to the results. I have seen national polls from reputable organizations where they weren’t included.
  • While there is little credible data supporting that there are “shy” Trump voters that are intentionally lying to pollsters, there still might be a social desirability bias that would undercount Trump’s support. That social desirability bias could be larger than it was in 2016, and it is still likely in the direction of under predicting Trump’s vote count.
  • Polls (and research surveys) tend to underrepresent rural areas. Folks in rural areas are less likely to be in online panels and to cooperate on surveys. Few pollsters take this into account. (I have never seen a corporate research client correcting for this, and it has been a pet peeve of mine for years.) This is a sample coverage issue that will likely undercount the Trump vote.
  • Sampling has continued to get harder. Cell phone penetration has continued to grow, online panel quality has fallen, and our best option (ABS sampling) is still far from random and so expensive it is beyond the reach of most polls.
  • “Herding” is a rarely discussed, but very real polling problem. Herding refers to pollsters who conduct a poll that doesn’t conform to what other polls are finding. These polls tend to get scrutinized and reweighted until they fit to expectations, or even worse, buried and never released. Think about it – if you are a respected polling organization that conducted a recent poll that showed Trump would win the popular vote, you’d review this poll intensely before releasing it and you might choose not to release it at all because it might put your firm’s reputation at risk to release a poll that looks different than the others. The only polls I have seen that appear to be out of range are ones from smaller organizations who are likely willing to run the risk of being viewed as predicting against the tide or who clearly have a political bias to them.

Once the dust settles, we will compose a post that analyzes how the 2020 polls did. For now, we feel there are a more credible reasons to believe the polls will be seen as predictive than to feel that we are on the edge of a polling mistake.  From a researcher’s standpoint, the biggest worry is that the polls will indeed be accurate, but won’t match the vote totals because of technicalities in vote counting and legal challenges. That would reflect unfairly on the polling and research industries.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.