Archive for the 'Response Rates' Category

Polling’s Winners and Losers from the Midterms

The pollsters did well last night.

Right now (the morning after the election), it is hard to know if 2022 will go down as a watershed moment when pollsters once again found their footing or if it will merely be a stay of execution. The 2018 midterms were also quite good for pollsters, yet the 2020 election was not.

To be clear, there are still many votes to count, so it is unfair to judge the polls too quickly. In POLL-ARIZED, I criticize media members who do. Nonetheless, below is a list of what I see as some winners and losers and some that seem like they are in the middle.

The Winners

  • Pre-election polling in general. For the most part, the polls did a good job of pointing out the close races, and exit polls suggest that they did an excellent job of highlighting the issues that concern voters most. I suspect the polling error rate will be far below the historical average of five+ points for midterm elections.
  • The “good” pollsters. The better-known polling brands, especially those with media partnerships, and some college polling centers had good results.
  • John King’s brain. Say what you want about CNN, but watching someone who knows the name of every county in America, the candidates in every election district, and the results of past elections perform without a net and stick the landing is impressive.
  • The CNN magic wall. I know other networks have them, but I can’t be the only data geek who marvels at the database systems and APIs behind CNN’s screen. It must have cost millions and involved dozens of people.
  • The Iowa Poll’s response rate. Their methodology statement says they contacted 1,118 Iowa residents for a final sample size of 801, with a response rate of 72%. This reminds me of the good old days. I would like to see pollsters spend more time benchmarking what Selzer & Co. are doing right with this poll.

The Losers

  • The partisan pollsters, particularly Trafalgar. These pollsters were way off this cycle. They have been way off in most cycles. I hope that non-partisan media outlets will stop covering them. They provide a story that outlets and viewers seeking a confirmation bias enjoy, but objective media should leave them behind for good.
  • The media who failed to see that there were so many less-reputable conservative polls released over the past two weeks. Most media were hoodwinked by this and ran a narrative that a red storm was brewing.
  • Response rates. I delved into the methodology of many final polls this cycle; most had net response rates of less than 2%. That is about half what response rates were just two years ago. The fact that the pollsters did so well with this low response is a testament to the brilliance of methodologists, but the data they have to work with is getting worse each cycle. They will not be able to keep pulling rabbits out of their hat.
  • The prediction markets. I have long hoped that the betting markets can emerge to provide a plausible alternative to polls regarding predicting elections so that the polls can focus on issues and not the horse races. These markets did not have a good night.
  • FiveThirtyEight’s pollster ratings. It is too early to make a definitive statement, but some of their highly rated pollsters had poor results, while many with middling grades did well. These ratings are helpful when they are accurate and have a defensible method behind them. When these gradings are inaccurate, they ruin reputations and businesses, so FiveThirtyEight must embrace that producing objective and accurate ratings is a serious responsibility.

The “So-So”

  • The Iowa Poll. Even with the high response, this poll seemed to overstate the Republican vote this time. They did get all the winners correct. This poll has a strong history of success, so it might be fair to chalk the slight miss up to normal sampling fluctuation. It isn’t statistically possible to get it right every single time. I must admit I have a bias of rooting for this poll.
  • The modelers, such as FiveThirtyEight and the Economist. On the hand, the concept of a probabilistic forecast is spot on. On the other, it is not particularly informative in coin-toss races. In this cycle, the forecasts they made for Senate and House seats weren’t much different than what could have been made by just tossing a coin in the contested races. Their median predictions for House and Senate seats overstated where the Republicans will end up, possibly because they also fell prey to the release of so many conservative-leaning polls in the campaign’s final stages.
  • Polling error direction. In the past few cycles, the polling error has been in the direction of overcounting Democrats. In 2022, this error seemed to move in the other direction. Historically, these errors have been uncorrelated from election to election, so I must admit that I’ve probably jumped the gun by suggesting in POLL-ARIZED the pro-Democrat error direction was structural and here to stay.
  • The media’s coverage of the polls on election day. In 2016 and 2020, the press reveled in bashing the pollsters. This time, they hardly talked about them at all. That seemed a bit unfair – if pollsters are going to be criticized when they do poorly, they should be celebrated when they do well.

All-in-all, a good night for the pollsters. But, I don’t want to rush to a conclusion that the polls are now fixed because, in reality, the pollsters didn’t change much in their methods from 2020. I hope the industry will study what went right, as we tend to re-examine our methods when they fail, not when they succeed.

Your grid questions probably aren’t working

Convincing people to participate in surveys and polls has become so challenging that more attention is going toward preventing them from suspending once they choose to respond.

Most survey suspends occur in one of two places. The first is at the initial screen the respondent sees. Respondents click through an invitation, and many quickly decide that the survey isn’t for them and abandon the effort.

The second most common place is the first grid question respondents encounter. They see an imposing grid question and decide it isn’t worth their time to continue. It doesn’t matter where this question is placed – this happens whether the first grid question is early in the questionnaire, in the middle, or toward the end.

Respondents hate answering grid questions. Yet clients continue to ask them, and survey researchers include them without much thought. The quality of data they yield tends to be low.

A measurement error issue with grid questions is known as “response set bias.” When we present a list of, say, ten items, we want to get a respondent to make an independent judgment of each, unrelated to what they think of the others. But, with a long list of items, that is not what happens. Instead, when people respond to later questions, they remember what they said earlier. If I indicated that feature A in a list was “somewhat important” to me when I assess feature B, it is natural to think about how it compares in importance to feature A. This introduces unwanted correlations into the data set.

Instead, we want a respondent to assess feature A, clear their mind entirely, and then assess feature B. That is a challenging task, but placing features on a long, intimidating list, makes it near impossible. Some researchers think we eliminate this error by randomizing the list order, but all that does is spread the error out. It is important to randomize the options so this error doesn’t concentrate on just a few items, but randomization does not solve the problem.

Errors you have probably heard of lurk in long grid questions. Things like fatigue biases (respondents attend less to the items late in the list), question order biases, priming effects, recency biases, etc. In short, grid questions are just asking for many measurement errors, and we end up crossing our fingers and hoping some of these cancel each other out.

This is admittedly a mundane topic, but it is the one questionnaire design issue I have the most difficulty convincing clients to do something about. Grid questions capture a lot of data in a short amount of questionnaire time, so they are enticing for clients.

I prefer a world where we seldom ask them. If we need to, we recommend maybe one or two per questionnaire and never more than 4 to 6 items in them. I rarely succeed in convincing clients of this.

“Textbook” explanations of problems with grid questions do not include the issue that bothers me most. What happens in grid questions is the question respondents hear and respond to is often not the literal question that is composed.

Consider a grid question like this, with a 5-point importance scale as the response options:

Q: How important were the following when you decided to buy the widget?

  1. The widget brand cares about sustainability
  2. The price of the widget
  3. The color of the widget is attractive to you
  4. The widget will last a long time

Think about the first item (“The widget brand cares about sustainability”). The client wants to understand how important sustainability is in the buying decision. How important of a buying criterion is sustainability?

But that is likely not what the respondent “hears” in the question. The respondent will probably see the question as asking if they care about sustainability and who doesn’t? So, what would tend to happen is sustainability would be overstated as a decision driver when analyzing the data set. Respondents don’t leap to thinking about sustainability as a buying consideration; instead, they respond about sustainability in general.

Clients and suppliers must realize that respondents do not parse our words as we would like them to, and they do not always attend to our questions. We need to anticipate this.

How do we fix this issue?  We should be more straightforward in how we ask questions. In this example, I would prefer to derive the importance of sustainability in the buying decision. I’d include a question asking how much they care about sustainability (and be careful to phrase it so it can have a response across various answer choices).  Then, in a second question, I would gather a dependent variable asking how likely they are to buy the widget in the future.

A regression or correlation analysis would provide coefficients across variables that indicate their relative importance. Yes, it would be based on correlations and not necessarily causation. In reality, research studies rarely set up the experiments necessary to give evidence of causation, and we should not get too hung up on that.

I would conclude that sustainability is an essential feature if it popped in the regression as having a high coefficient and if I saw something else in other questions or open-ends that indicated sustainability mattered from another angle. Always look for another data point or another data source that supports your conclusion.

Grid questions are the most over-rated and overused types of survey questions. Clients like them, but they tend to provide poor-quality data. Use them sparingly and look for alternatives.

Pre-Election Polling and Baseball Share a Lot in Common

The goal of a pre-election poll is to predict which candidate will win an election and by how much. Pollsters work towards this goal by 1) obtaining a representative sample of respondents, 2) determining which candidate a respondent will vote for, and 3) predicting the chances each respondent will take the time to vote.

All three of these steps involve error. It is the first one, obtaining a representative sample of respondents, which has changed the most in the past decade or so.

It is the third characteristic that separates pre-election polling from other forms of polling and survey research. Statisticians must predict how likely each person they interview will be to vote. This is called their “Likely Voter Model.”

As I state in POLL-ARIZED, this is perhaps the most subjective part of the polling process. The biggest irony in polling is that it becomes an art when we hand the data to the scientists (methodologists) to apply a Likely Voter Model.

It is challenging to understand what pollsters do in their Likely Voter Models and perhaps even more challenging to explain.  

An example from baseball might provide a sense of what pollsters are trying to do with these models.

Suppose Mike Trout (arguably the most underappreciated sports megastar in history) is stepping up to the plate. Your job is to predict Trout’s chances of getting a hit. What is your best guess?

You could take a random guess between 0 and 100%. But, since that would give you a 1% chance of being correct, there must be a better way.

A helpful approach comes from a subset of statistical theory called Bayesian statistics. This theory says we can start with a baseline of Trout’s hit probability based on past data.

For instance, we might see that so far this year, the overall major league batting average is .242. So, we might guess that Trout’s probability of getting a hit is 24%.

This is better than a random guess. But, we can do better, as Mike Trout is no ordinary hitter.

We might notice there is even better information out there. Year-to-date, Trout is batting .291. So, our guess for his chances might be 29%. Even better.

Or, we might see that Trout’s lifetime average is .301 and that he hit .333 last year. Since we believe in a concept called regression to the mean, that would lead us to think that his batting average should be better for the rest of the season than it is currently. So, we revise our estimate upward to 31%.

There is still more information we can use. The opposing pitcher is Justin Verlander. Verlander is a rare pitcher who has owned Trout in the past – Trout’s average is just .116 against Verlander. This causes us to revise our estimate downward a bit. Perhaps we take it to about 25%.

We can find even more information. The bases are loaded. Trout is a clutch hitter, and his career average with men on base is about 10 points higher than when the bases are empty. So, we move our estimate back up to about 28%.

But it is August. Trout has a history of batting well early in and late in the season, but he tends to cool off during the dog days of summer. So, we decide to end this and settle on a probability of 25%.

This sort of analysis could go on forever. Every bit of information we gather about Trout can conceivably help make a better prediction for his chances. Is it raining? What is the score? What did he have for breakfast? Is he in his home ballpark? Did he shave this morning? How has Verlander pitched so far in this game? What is his pitch count?

There are pre-election polling analogies in this baseball example, particularly if you follow the probabilistic election models created by organizations like FiveThirtyEight and The Economist.

Just as we might use Trout’s lifetime average as our “prior” probability, these models will start with macro variables for their election predictions. They will look at the past implications of things like incumbency, approval ratings, past turnout, and economic indicators like inflation, unemployment, etc. In theory, these can adjust our assumptions of who will win the election before we even include polling data.

Of course, using Trout’s lifetime average or these macro variables in polling will only be helpful to the extent that the future behaves like the past. And therein lies the rub – overreliance on past experience makes these models inaccurate during dynamic times.

Part of why pollsters missed badly in 2020 is unique things were going on – a global pandemic, changed methods of voting, increased turnout, etc. In baseball, perhaps this is a year with a juiced baseball, or Trout is dealing with an injury.

The point is that while unprecedented things are unpredictable, they happen with predictable regularity. There is always something unique about an election cycle or a Mike Trout at bat.

The most common question I am getting from readers of POLL-ARIZED is, “will the pollsters get it right in 2024?” My answer is that since pollsters are applying past assumptions in their model, they will get it right to the extent that the world in 2024 looks like the world did in 2020, and I would not put my own money on it.

I make a point in POLL-ARIZED that pollsters’ models have become too complex. While in theory, the predictive value of a model never gets worse when you add in more variables, in practice, this has made these models uninterpretable. Pollsters include so many variables in their likely voter models that many of their adjustments cancel each other out. They are left with a model with no discernable underlying theory.

If you look closely, we started with a probability of 24% for Trout. Even after looking at a lot of other information and making reasonable adjustments, we still ended up with a prediction of 25%. The election models are the same way. They include so many variables that they can cancel out each other’s effects and end up with a prediction that looks much like the raw data did before the methodologists applied their wizardry.

This effort is better spent at getting better input for the models by investing in generating the trust needed to increase the response rates we get to our surveys and polls. Improving the quality of our data input will increase the predictive quality of the polls more than coming up with more complicated ways to weight the data.

Of course, in the end, one candidate wins, and the other loses, and Mike Trout either gets a hit, or he doesn’t, so the actual probability moves to 0% or 100%. Trout cannot get 25% of a hit, and a candidate cannot win 79% of an election.

As I write this, I looked up the last time Trout faced Verlander. It turns out Verlander struck him out!

The Insight that Insights Technology is Missing

The market research insights industry has long been characterized by a resistance to change. This likely results from the academic nature of what we do. We don’t like to adopt new ways of doing things until they have been proven and studied.

I would posit that the insights industry has not seen much change since the transition from telephone to online research occurred in the early 2000s. And even that transition created discord within the industry, with many traditional firms resistant to moving on from telephone studies because online data collection had not been thoroughly studied and vetted.

In the past few years, the insights industry has seen an influx of capital, mostly from private equity and venture capital firms. The conditions for this cash infusion have been ripe: a strong and growing demand for insights, a conservative industry that is slow to adapt, and new technologies arising that automate many parts of a research project have all come together simultaneously.

Investing organizations see this enormous business opportunity. Research revenues are growing, and new technologies are lowering costs and shortening project timeframes. It is a combustible business situation that needs a capital accelerant.

Old school researchers, such as myself, are becoming nervous. We worry that automation will harm our businesses and that the trend toward DIY projects will result in poor-quality studies. Technology is threatening the business models under which we operate.

The trends toward investment in automation in the insights industry are clear. Insights professionals need to embrace this and not fight it.

However, although the movement toward automation will result in faster and cheaper studies, this investment ignores the threats that declining data quality creates. In the long run, this automation will accelerate the decline in data quality rather than improve it.

It is great that we are finding ways to automate time-consuming research tasks, such as questionnaire authoring, sampling, weighting, and reporting. This frees up researchers to concentrate on drawing insights out of the data. But, we can apply all the automation in the world to the process, yet if we do not do something about data quality, it will not increase the value clients receive.

I argue in POLL-ARIZED that the elephant in the research room is the fact that very few people want to take our surveys anymore. When I began in this industry, I routinely fielded telephone projects with 70-80% response rates. Currently, telephone and online response rates are between 3-4% for most projects.

Response rates are not everything. You can make a compelling argument that they do not matter at all. There is no problem as long as the 3-4% response we get is representative. I would rather have a representative 3% answer a study than a biased 50%.

But, the fundamental problem is that this 3-4% is not representative. Only about 10% of the US population is currently willing to take surveys. What is happening is that this same 10% is being surveyed repeatedly. In the most recent project Crux fielded, respondents had taken an average of 8 surveys in the past two weeks. So, we have about 10% of the population taking surveys every other day, and our challenge is to make them represent the rest of the population.

Automate all you want, but the data that are the backbone of the insights we are producing quickly and cheaply is of historically low quality.

The new investment flooding into research technology will contribute to this problem. More studies will be done that are poorly designed, with long, tortuous questionnaires. Many more surveys will be conducted, fewer people will be willing to take them, and response rates will continue to fall.

There are plenty of methodologists working on these problems. But, for the most part, they are working on new ways to weight the data we can obtain rather than on ways to compel more response. They are improving data quality, but only slightly, and the insights field continues to ignore the most fundamental problem we have: people do not want to take our surveys.

For the long-term health of our field, that is where the investment should go.

In POLL-ARIZED, I list ten potential solutions to this problem. I am not optimistic that any of them will be able to stem the trend toward poor data quality. But, I am continually frustrated that our industry has not come together to work towards expanding respondent trust and the base of people willing to take part in our projects.

The trend towards research technology and automation is inevitable. It will be profitable. But, unless we address data quality issues, it will ultimately hasten the decline of this field.

POLL-ARIZED available on May 10

I’m excited to announce that my book, POLL-ARIZED, will be available on May 10.
 
After the last two presidential elections, I was fearful my clients would ask a question I didn’t know how to answer: “If pollsters can’t predict something as simple as an election, why should I believe my market research surveys are accurate?”
 
POLL-ARIZED results from a year-long rabbit hole that question led me down! In the process, I learned a lot about why polls matter, how today’s pollsters are struggling, and what the insights industry should do to improve data quality.
 
I am looking for a few more people to read an advance copy of the book and write an Amazon review on May 10. If you are interested, please send me a message at poll-arized@cruxresearch.com.

Questions You Are Not Asking Your Market Research Supplier That You Should Be Asking

It is no secret that providing representative samples for market research projects has become challenging. While clients are always focused on obtaining respondents quickly and efficiently, it is also important that they are concerned with the quality of their data. The reality is that quality is slipping.

While there are many causes of this, one that is not discussed much is that clients rarely ask their suppliers the tough questions they should. Clients are not putting pressure on suppliers to focus on data quality. Since clients ultimately control the purse strings of projects, suppliers will only improve quality if clients demand it.

I can often tell if I have an astute client by their questions when we are designing studies. Newer or inexperienced clients tend to start by talking about the questionnaire topics. Experienced clients tend to start by talking about the sample and its representativeness.

Below is a list of a few questions that I believe clients should be asking their suppliers on every study. The answers to these are not always easy to come by, but as a client, you want to see that your supplier has contemplated these questions and pays close attention to the issues they highlight.

For each, I have also provided a correct or acceptable answer to expect from your supplier.

  • What was the response rate to my study? While it was once commonplace to report response rates, suppliers try to dodge this issue. Most data quality issues stem from low response rates. Correct Answer: For most studies, under 5%. Unless the survey is being fielded among a highly engaged audience, such as your customers, you should be suspicious of any answer over 15%. “I don’t know” is an unacceptable answer. Suppliers will also try to convince you that response rates do not matter when every data quality issue we experience stems from inadequate response to our surveys.
  • How many respondents did you remove in fielding for quality issues? This is an emerging issue. The number of bad-quality respondents in studies has grown substantially in just the last few years. Correct answer: at least 10%, but preferably between 25% and 40%. If your supplier says 0%, you should question whether they are properly paying attention to data quality issues. I would guide you to find a different supplier if they cannot describe a process to remove poor-quality respondents. There is no standard way of doing this, but each supplier should have an established process.
  • How were my respondents sourced? This is an essential question seldom asked unless our client is an academic researcher. It is a tricky question to answer. Correct answer: This is so complicated that I have difficulty providing a cogent response to our clients. Here, the hope is that your supplier has at least some clue as to how the panel companies get their respondents and know who to go to if a detailed explanation is needed. They should connect you with someone who can explain this in detail.
  • What are you doing to protect against bots? Market research samples are subject to the ugly things that happen online – hackers, bots, cheaters, etc. Correct answer: Something proactive. They might respond that they are working with the panel companies to prevent bots or a third-party firm to address this. If they are not doing anything or don’t seem to know that bots are a big issue for surveys, you should be concerned.
  • What is in place to ensure that my respondents are not being used for competitors or vice-versa? Often, clients should care that the people answering their surveys have not done another project in your product category recently. I have had cases where two suppliers working for the same client (one being us) used the same sample source and polluted the sample base for both projects because we did not know the other study was fielding. Correct answer: Something if this is important to you. If your research covers brand or advertising awareness, you should account for this. If you are commissioning work with several suppliers, this takes considerable coordination.
  • Did you run simulated data through my survey before fielding? This is an essential, behind-the-scenes step that all suppliers that know what they are doing take. Running thousands of simulated surveys through the questionnaire tests survey logic and ensures that the right people get to the right questions. While it doesn’t prevent all errors, it catches many of them. Correct answer: Yes. If the supplier does not know what simulated data is, it is time to consider a new supplier.
  • How many days will my study be in the field? Many errors in data quality stem from conducting studies too quickly. Correct answer: Varies, but this should be 10-21 days for a typical project. If your study better have difficult-to-find respondents, this could be 3-4 weeks. If the data collection period is shorter than ten days, you WILL have data quality errors that arise, so be sure you understand the tradeoffs for speed. Don’t insist on field speed unless you need to.
  • Can I have a copy of the panel company’s answers to the ESOMAR questions? ESOMAR has put out a list of questions to help buyers of online samples. Every sample supplier worth using will have created a document that answers these questions. Correct answer: Yes. Do not work with a company that has not put together a document answering these questions, as all the good ones have. However, after reading this document, don’t expect to understand how your respondents are being sourced.
  • How do you handle requests down the road when the study is over? It is a longstanding pet peeve of most clients that suppliers charge for basic customer support after the project is over. Make sure you have set expectations properly upfront and put these expectations into the contract. Correct answer: Forever. Our company only charges if support requests become substantial. Many suppliers will provide support for three- or six months post-study and will charge for this support. I have never understood this, as I am flattered when a client calls me to discuss a study that was done years ago, as this means our study is continuing to make an impact. We do not charge for this follow-up unless the request requires so much time that we have to.

There are probably many other questions clients should be asking suppliers. Clients need to get tougher on insisting on data quality. It is slipping, and suppliers are not investing enough to improve response rates and develop trust with respondents. If clients pressure them, the economic incentives will be there to create better techniques to obtain quality research data.

Which quality control checks questions should you use in your surveys?

While it is no secret that the quality of market research data has declined, how to address poor data quality is rarely discussed among clients and suppliers. When I started in market research more than 30 years ago, telephone response rates were about 60%. Six in 10 people contacted for a market research study would choose to cooperate and take our polls. Currently, telephone response rates are under 5%. If we are lucky, 1 in 20 people will take part. Online research is no better, as even from verified customer lists response rates are commonly under 10% and even the best research panels can have response rates under 5%.

Even worse, once someone does respond, a researcher has to guard against “bogus” interviews that come from scripts and bots, as well as individuals who are cheating on the survey to claim the incentives offered. Poor-quality data is clearly on the rise and is an existential threat to the market research industry that is not being taken seriously enough.

Maximizing response requires a broad approach with tactics deployed throughout the process. One important step is to cleanse each project of bad quality respondents. Another hidden secret in market research is that researchers routinely have to remove anywhere from 10% to 50% of respondents from their database due to poor quality.

Unfortunately, there is no industry standard way of doing this – of identifying poor-quality respondents. Every supplier sets their own policies. This is likely because there is considerable variability in how respondents are sourced for studies, and a one-size-fits-all approach might not be possible, and some quality checks depend on the specific topic of the study. Unfortunately, researchers are left to largely fend for themselves when trying to come up with a process for how to remove poor quality respondents from their data.

One of the most important ways to guard against poor quality respondents is to design a compelling questionnaire to begin with. Respondents will attend to a short, relevant survey. Unfortunately, we rarely provide them with this experience.

We have been researching this issue recently in an effort to come up with a workable process for our projects. Below, we share our thoughts. The market research industry needs to work together on this issue, as when one of us removes a bad respondent from a database in helps the next firm with their future studies.

There is a practical concern for most studies – we rarely have room for more than a handful of questions that relate to quality control. In addition to speeder and straight-line checks, studies tend to have room for about 4-5 quality control questions. With the exception of “severe speeders” as described below, respondents will be automatically removed if they fail three or more of the checks. We use a “three strikes and you’re out” rule to remove respondents. If anything, this is probably too conservative, but we’d rather err on the side of retaining some bad quality respondents in than inadvertently removing some good quality ones.

When possible, we favor checks that can be done programmatically, without human intervention, as that keeps fielding and quota management more efficient. To the degree possible, all quality check questions should have a base of “all respondents” and not be asked of subgroups.

Speeder Checks

We aim to set up two criteria: “severe” speeders are those that complete the survey in less than one-third of the median time. These respondents are automatically tossed. “Speeders” are those that take between one-third and one-half of the median time, and these respondents are flagged.

We also consider setting up timers within the survey – for example, we may place timers on a particularly long grid question or a question that requires substantial reading on the part of the respondent. Note that when establishing speeder checks it is important to use the median length as a benchmark and not the mean. In online surveys, some respondents will start a survey and then get distracted for a few hours and come back to it, and this really skews the average survey length. Using the median gets around that.

Straight Line Checks

Hopefully, we have designed our study well and do not have long grid type questions. However, more often than not these types of questions find their way into questionnaires.  For grids with more than about six items, we place a straight-lining check – if a respondent chooses the same response for all items in the grid, they are flagged.

Inconsistent Answers

We consider adding two question that check for inconsistent answers. First, we re-ask a demographic question from the screener near the end of the survey. We typically use “age” as this question. If the respondent doesn’t choose the same age in both questions, they are flagged.

In addition, we try to find an attitudinal question that is asked that we can re-ask in the exact opposite way. For instance, if earlier we asked “I like to go to the mall” on a 5-point agreement scale, we will also ask the opposite: “I do not like to go to the mall” on the same scale. Those that answer the same for both are flagged. We try to place these two questions a few minutes apart in the questionnaire.

Low Incidence items

This is a low attentiveness flag. It is meant to catch people who say they do really unlikely things and also catch people who say they don’t do likely things because they are not really paying attention to the questions we pose. We design this question specific to each survey and tend to ask what respondents have done over the past weekend. We like to have two high incidence items (such as “watched TV,” or “rode in a car”), 4 to 5 low incidence items (such as “flew in an airplane,” “read an entire book,” “played poker”) and one incredibly low incidence item (such as “visited Argentina”).  Respondents are flagged if they didn’t do at least one of our high incidence items, if they said they did more than two of our low incidence items, or if they say they did our incredibly low incidence item.

Open-ended check

We try to include this one in all studies, but sometimes have to skip it if the study is fielding on a tight timeframe because it involves a manual process. Here, we are seeing if a respondent provides a meaningful response to an open-ended question. Hopefully, we can use a question that is already in the study for this, but when we cannot we tend to use one like this: “Now I’d like to hear your opinions about some other things. Tell me about a social issue or cause that you really care about.  What is this cause and why do you care about it?” We are manually looking to see if they provide an articulate answer and they are flagged if they do not.

Admission of inattentiveness

We don’t use this one as a standard, but are starting to experiment with it. As the last question of the survey, we can ask respondents how attentive they were. This will suffer from a large social desirability bias, but we will sometimes directly ask them how attentive they were when taking the survey, and flag those that say they did not pay attention at all.

Traps and misdirects

I don’t really like the idea of “trick questions” – there is research that indicates that these types of questions tend to trap too many “good” respondents. Some researchers feel that these questions lower respondent trust and thus answer quality. That seems to be enough to recommend against this style of question. The most common types I have seen ask a respondent to select the “third choice” below no matter what, or to “pick the color from the list below,” or “select none of the above.” We counsel against using these.

Comprehension

This was recommended by a research colleague and was also mentioned by an expert in a questionnaire design seminar we attended. We don’t use this as a quality check, but like to use it during a soft-launch period. The question looks like this: “Thanks again for taking this survey.  Were there any questions on this survey you had difficulty with or trouble answering?  If so, it will be helpful to us if you let us know what those problems were in the space below.” This is a useful question, but we don’t use it as a quality check per se.

Preamble

I have mixed feelings on this type of quality check, but we use it when we can phrase it positively. A typical wording is like this: “By clicking yes, you agree to continue to our survey and give your best effort to answer 10-15 minutes of questions. If you speed through the survey or otherwise don’t give a good effort, you will not receive credit for taking the survey.”

This is usually one of the first questions in the survey. The argument I see against this is it sets the respondent up to think we’ll be watching them and that could potentially affect their answers. Then again, it might affect them in a good way if it makes them attend more.

I prefer a question that takes a gentler, more positive approach – telling respondents we are conducting this for an important organization, that their opinions will really matter, promise them confidentiality, and then ask them to agree to give their best effort, as opposed to lightly threatening them as this one does.

Guarding against bad respondents has become an important part of questionnaire design, and it is unfortunate that there is no industry standard on how to go about it. We try to build in some quality checks that will at least spot the most egregious cases of poor quality. This is an evolving issue, and it is likely that what we are doing today will change over time, as the nature of market research changes.

Oops, the polls did it again

Many people had trouble sleeping last night wondering if their candidate was going to be President. I couldn’t sleep because as the night wore on it was becoming clear that this wasn’t going to be a good night for the polls.

Four years ago on the day after the election I wrote about the “epic fail” of the 2016 polls. I couldn’t sleep last night because I realized I was going to have to write another post about another polling failure. While the final vote totals may not be in for some time, it is clear that the 2020 polls are going to be off on the national vote even more than the 2016 polls were.

Yesterday, on election day I received an email from a fellow market researcher and business owner. We are involved in a project together and he was lamenting how poor the data quality has been in his studies recently and was wondering if we were having the same problems.

In 2014 we wrote a blog post that cautioned our clients that we were detecting poor quality interviews that needed to be discarded about 10% of the time. We were having to throw away about 1 in 10 of the interviews we collected.

Six years later that percentage has moved to be between 33% and 45% and we tend to be conservative in the interviews we toss. It is fair to say that for most market research studies today, between a third and a half of the interviews being collected are, for a lack of a better term, junk.  

It has gotten so bad that new firms have sprung up that serve as a go-between from sample providers and online questionnaires in order to protect against junk interviews. They protect against bots, survey farms, duplicate interviews, etc. Just the fact that these firms and terms like “survey farms” exist should give researchers pause regarding data quality.

When I started in market research in the late 80s/early 90’s we had a spreadsheet program that was used to help us cost out projects. One parameter in this spreadsheet was “refusal rate” – the percent of respondents who would outright refuse to take part in a study. While the refusal rate varied by study, the beginning assumption in this program was 40%, meaning that on average we expected 60% of the time respondents would cooperate. 

According to Pew and AAPOR in 2018 the cooperation rate for telephone surveys was 6% and falling rapidly.

Cooperation rates in online surveys are much harder to calculate in a standardized way, but most estimates I have seen and my own experience suggest that typical cooperation rates are about 5%. That means for a 1,000-respondent study, at least 20,000 emails are sent, which is about four times the population of the town I live in.

This is all background to try to explain why the 2020 polls appear to be headed to a historic failure. Election polls are the public face of the market research industry. Relative to most research projects, they are very simple. The problems pollsters have faced in the last few cycles is emblematic of something those working in research know but rarely like to discuss: the quality of data collected for research and polls has been declining, and should be alarming to researchers.

I could go on about the causes of this. We’ve tortured our respondents for a long time. Despite claims to the contrary, we haven’t been able to generate anything close to a probability sample in years. Our methodologists have gotten cocky and feel like they can weight any sampling anomalies away. Clients are forcing us to conduct projects on timelines that make it impossible to guard against poor quality data. We focus on sampling error and ignore more consequential errors. The panels we use have become inbred and gather the same respondents across sources. Suppliers are happy to cash the check and move on to the next project.

This is the research conundrum of our times: in a world where we collect more data on people’s behavior and attitudes than ever before, the quality of the insights we glean from these data is in decline.

Post 2016 the polling industry brain trust rationalized and claimed that the polls actually did a good job, convened some conferences to discuss the polls, and made modest methodological changes. Almost all of these changes related to sampling and weighting. But, as it appears that the 2020 polling miss is going to be way beyond what can be explained by sampling (last night I remarked to my wife that “I bet the p-value of this being due to sampling is about 1 in 1,000”), I feel that pollsters have addressed the wrong problem.

None of the changes pollsters made addressed the long-term problems researchers face with data quality. When you have a response rate of 5% and up to half of those are interviews you need to throw away, errors that can arise are orders of magnitude greater than the errors that are generated by sampling and weighting mistakes.

I don’t want to sound like I have the answers.  Just a few days ago I posted that I thought that on balance there were more reasons to conclude that the polls would do a good job this time than to conclude that they would fail. When I look through my list of potential reasons the polls might fail, nothing leaps to me as an obvious cause, so perhaps the problem is multi-faceted.

What I do know is the market research industry has not done enough to address data quality issues. And every four years the polls seem to bring that into full view.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.