Archive for the 'Response Rates' Category

Which quality control checks questions should you use in your surveys?

While it is no secret that the quality of market research data has declined, how to address poor data quality is rarely discussed among clients and suppliers. When I started in market research more than 30 years ago, telephone response rates were about 60%. Six in 10 people contacted for a market research study would choose to cooperate and take our polls. Currently, telephone response rates are under 5%. If we are lucky, 1 in 20 people will take part. Online research is no better, as even from verified customer lists response rates are commonly under 10% and even the best research panels can have response rates under 5%.

Even worse, once someone does respond, a researcher has to guard against “bogus” interviews that come from scripts and bots, as well as individuals who are cheating on the survey to claim the incentives offered. Poor-quality data is clearly on the rise and is an existential threat to the market research industry that is not being taken seriously enough.

Maximizing response requires a broad approach with tactics deployed throughout the process. One important step is to cleanse each project of bad quality respondents. Another hidden secret in market research is that researchers routinely have to remove anywhere from 10% to 50% of respondents from their database due to poor quality.

Unfortunately, there is no industry standard way of doing this – of identifying poor-quality respondents. Every supplier sets their own policies. This is likely because there is considerable variability in how respondents are sourced for studies, and a one-size-fits-all approach might not be possible, and some quality checks depend on the specific topic of the study. Unfortunately, researchers are left to largely fend for themselves when trying to come up with a process for how to remove poor quality respondents from their data.

One of the most important ways to guard against poor quality respondents is to design a compelling questionnaire to begin with. Respondents will attend to a short, relevant survey. Unfortunately, we rarely provide them with this experience.

We have been researching this issue recently in an effort to come up with a workable process for our projects. Below, we share our thoughts. The market research industry needs to work together on this issue, as when one of us removes a bad respondent from a database in helps the next firm with their future studies.

There is a practical concern for most studies – we rarely have room for more than a handful of questions that relate to quality control. In addition to speeder and straight-line checks, studies tend to have room for about 4-5 quality control questions. With the exception of “severe speeders” as described below, respondents will be automatically removed if they fail three or more of the checks. We use a “three strikes and you’re out” rule to remove respondents. If anything, this is probably too conservative, but we’d rather err on the side of retaining some bad quality respondents in than inadvertently removing some good quality ones.

When possible, we favor checks that can be done programmatically, without human intervention, as that keeps fielding and quota management more efficient. To the degree possible, all quality check questions should have a base of “all respondents” and not be asked of subgroups.

Speeder Checks

We aim to set up two criteria: “severe” speeders are those that complete the survey in less than one-third of the median time. These respondents are automatically tossed. “Speeders” are those that take between one-third and one-half of the median time, and these respondents are flagged.

We also consider setting up timers within the survey – for example, we may place timers on a particularly long grid question or a question that requires substantial reading on the part of the respondent. Note that when establishing speeder checks it is important to use the median length as a benchmark and not the mean. In online surveys, some respondents will start a survey and then get distracted for a few hours and come back to it, and this really skews the average survey length. Using the median gets around that.

Straight Line Checks

Hopefully, we have designed our study well and do not have long grid type questions. However, more often than not these types of questions find their way into questionnaires.  For grids with more than about six items, we place a straight-lining check – if a respondent chooses the same response for all items in the grid, they are flagged.

Inconsistent Answers

We consider adding two question that check for inconsistent answers. First, we re-ask a demographic question from the screener near the end of the survey. We typically use “age” as this question. If the respondent doesn’t choose the same age in both questions, they are flagged.

In addition, we try to find an attitudinal question that is asked that we can re-ask in the exact opposite way. For instance, if earlier we asked “I like to go to the mall” on a 5-point agreement scale, we will also ask the opposite: “I do not like to go to the mall” on the same scale. Those that answer the same for both are flagged. We try to place these two questions a few minutes apart in the questionnaire.

Low Incidence items

This is a low attentiveness flag. It is meant to catch people who say they do really unlikely things and also catch people who say they don’t do likely things because they are not really paying attention to the questions we pose. We design this question specific to each survey and tend to ask what respondents have done over the past weekend. We like to have two high incidence items (such as “watched TV,” or “rode in a car”), 4 to 5 low incidence items (such as “flew in an airplane,” “read an entire book,” “played poker”) and one incredibly low incidence item (such as “visited Argentina”).  Respondents are flagged if they didn’t do at least one of our high incidence items, if they said they did more than two of our low incidence items, or if they say they did our incredibly low incidence item.

Open-ended check

We try to include this one in all studies, but sometimes have to skip it if the study is fielding on a tight timeframe because it involves a manual process. Here, we are seeing if a respondent provides a meaningful response to an open-ended question. Hopefully, we can use a question that is already in the study for this, but when we cannot we tend to use one like this: “Now I’d like to hear your opinions about some other things. Tell me about a social issue or cause that you really care about.  What is this cause and why do you care about it?” We are manually looking to see if they provide an articulate answer and they are flagged if they do not.

Admission of inattentiveness

We don’t use this one as a standard, but are starting to experiment with it. As the last question of the survey, we can ask respondents how attentive they were. This will suffer from a large social desirability bias, but we will sometimes directly ask them how attentive they were when taking the survey, and flag those that say they did not pay attention at all.

Traps and misdirects

I don’t really like the idea of “trick questions” – there is research that indicates that these types of questions tend to trap too many “good” respondents. Some researchers feel that these questions lower respondent trust and thus answer quality. That seems to be enough to recommend against this style of question. The most common types I have seen ask a respondent to select the “third choice” below no matter what, or to “pick the color from the list below,” or “select none of the above.” We counsel against using these.

Comprehension

This was recommended by a research colleague and was also mentioned by an expert in a questionnaire design seminar we attended. We don’t use this as a quality check, but like to use it during a soft-launch period. The question looks like this: “Thanks again for taking this survey.  Were there any questions on this survey you had difficulty with or trouble answering?  If so, it will be helpful to us if you let us know what those problems were in the space below.” This is a useful question, but we don’t use it as a quality check per se.

Preamble

I have mixed feelings on this type of quality check, but we use it when we can phrase it positively. A typical wording is like this: “By clicking yes, you agree to continue to our survey and give your best effort to answer 10-15 minutes of questions. If you speed through the survey or otherwise don’t give a good effort, you will not receive credit for taking the survey.”

This is usually one of the first questions in the survey. The argument I see against this is it sets the respondent up to think we’ll be watching them and that could potentially affect their answers. Then again, it might affect them in a good way if it makes them attend more.

I prefer a question that takes a gentler, more positive approach – telling respondents we are conducting this for an important organization, that their opinions will really matter, promise them confidentiality, and then ask them to agree to give their best effort, as opposed to lightly threatening them as this one does.

Guarding against bad respondents has become an important part of questionnaire design, and it is unfortunate that there is no industry standard on how to go about it. We try to build in some quality checks that will at least spot the most egregious cases of poor quality. This is an evolving issue, and it is likely that what we are doing today will change over time, as the nature of market research changes.

Oops, the polls did it again

Many people had trouble sleeping last night wondering if their candidate was going to be President. I couldn’t sleep because as the night wore on it was becoming clear that this wasn’t going to be a good night for the polls.

Four years ago on the day after the election I wrote about the “epic fail” of the 2016 polls. I couldn’t sleep last night because I realized I was going to have to write another post about another polling failure. While the final vote totals may not be in for some time, it is clear that the 2020 polls are going to be off on the national vote even more than the 2016 polls were.

Yesterday, on election day I received an email from a fellow market researcher and business owner. We are involved in a project together and he was lamenting how poor the data quality has been in his studies recently and was wondering if we were having the same problems.

In 2014 we wrote a blog post that cautioned our clients that we were detecting poor quality interviews that needed to be discarded about 10% of the time. We were having to throw away about 1 in 10 of the interviews we collected.

Six years later that percentage has moved to be between 33% and 45% and we tend to be conservative in the interviews we toss. It is fair to say that for most market research studies today, between a third and a half of the interviews being collected are, for a lack of a better term, junk.  

It has gotten so bad that new firms have sprung up that serve as a go-between from sample providers and online questionnaires in order to protect against junk interviews. They protect against bots, survey farms, duplicate interviews, etc. Just the fact that these firms and terms like “survey farms” exist should give researchers pause regarding data quality.

When I started in market research in the late 80s/early 90’s we had a spreadsheet program that was used to help us cost out projects. One parameter in this spreadsheet was “refusal rate” – the percent of respondents who would outright refuse to take part in a study. While the refusal rate varied by study, the beginning assumption in this program was 40%, meaning that on average we expected 60% of the time respondents would cooperate. 

According to Pew and AAPOR in 2018 the cooperation rate for telephone surveys was 6% and falling rapidly.

Cooperation rates in online surveys are much harder to calculate in a standardized way, but most estimates I have seen and my own experience suggest that typical cooperation rates are about 5%. That means for a 1,000-respondent study, at least 20,000 emails are sent, which is about four times the population of the town I live in.

This is all background to try to explain why the 2020 polls appear to be headed to a historic failure. Election polls are the public face of the market research industry. Relative to most research projects, they are very simple. The problems pollsters have faced in the last few cycles is emblematic of something those working in research know but rarely like to discuss: the quality of data collected for research and polls has been declining, and should be alarming to researchers.

I could go on about the causes of this. We’ve tortured our respondents for a long time. Despite claims to the contrary, we haven’t been able to generate anything close to a probability sample in years. Our methodologists have gotten cocky and feel like they can weight any sampling anomalies away. Clients are forcing us to conduct projects on timelines that make it impossible to guard against poor quality data. We focus on sampling error and ignore more consequential errors. The panels we use have become inbred and gather the same respondents across sources. Suppliers are happy to cash the check and move on to the next project.

This is the research conundrum of our times: in a world where we collect more data on people’s behavior and attitudes than ever before, the quality of the insights we glean from these data is in decline.

Post 2016 the polling industry brain trust rationalized and claimed that the polls actually did a good job, convened some conferences to discuss the polls, and made modest methodological changes. Almost all of these changes related to sampling and weighting. But, as it appears that the 2020 polling miss is going to be way beyond what can be explained by sampling (last night I remarked to my wife that “I bet the p-value of this being due to sampling is about 1 in 1,000”), I feel that pollsters have addressed the wrong problem.

None of the changes pollsters made addressed the long-term problems researchers face with data quality. When you have a response rate of 5% and up to half of those are interviews you need to throw away, errors that can arise are orders of magnitude greater than the errors that are generated by sampling and weighting mistakes.

I don’t want to sound like I have the answers.  Just a few days ago I posted that I thought that on balance there were more reasons to conclude that the polls would do a good job this time than to conclude that they would fail. When I look through my list of potential reasons the polls might fail, nothing leaps to me as an obvious cause, so perhaps the problem is multi-faceted.

What I do know is the market research industry has not done enough to address data quality issues. And every four years the polls seem to bring that into full view.

The myth of the random sample

Sampling is at the heart of market research. We ask a few people questions and then assume everyone else would have answered the same way.

Sampling works in all types of contexts. Your doctor doesn’t need to test all of your blood to determine your cholesterol level – a few ounces will do. Chefs taste a spoonful of their creations and then assume the rest of the pot will taste the same. And, we can predict an election by interviewing a fairly small number of people.

The mathematical procedures that are applied to samples that enable us to project to a broader population all assume that we have a random sample. Or, as I tell research analysts: everything they taught you in statistics assumes you have a random sample. T-tests, hypotheses tests, regressions, etc. all have a random sample as a requirement.

Here is the problem: We almost never have a random sample in market research studies. I say “almost” because I suppose it is possible to do, but over 30 years and 3,500 projects I don’t think I have been involved in even one project that can honestly claim a random sample. A random sample is sort of a Holy Grail of market research.

A random sample might be possible if you have a captive audience. You can random sample some the passengers on a flight or a few students in a classroom or prisoners in a detention facility. As long as you are not trying to project beyond that flight or that classroom or that jail, the math behind random sampling will apply.

Here is the bigger problem: Most researchers don’t recognize this, disclose this, or think through how to deal with it. Even worse, many purport that their samples are indeed random, when they are not.

For a bit of research history, once the market research industry really got going the telephone random digit dial (RDD) sample became standard. Telephone researchers could randomly call land line phones. When land line telephone penetration and response rates were both high, this provided excellent data. However, RDD still wasn’t providing a true random, or probability sample. Some households had more than one phone line (and few researchers corrected for this), many people lived in group situations (colleges, medical facilities) where they couldn’t be reached, some did not have a land line, and even at its peak, telephone response rates were only about 70%. Not bad. But, also, not random.

Once the Internet came of age, researchers were presented with new sampling opportunities and challenges. Telephone response rates plummeted (to 5-10%) making telephone research prohibitively expensive and of poor quality. Online, there was no national directory of email addresses or cell phone numbers and there were legal prohibitions against spamming, so researchers had to find new ways to contact people for surveys.

Initially, and this is still a dominant method today, research firms created opt-in panels of respondents. Potential research participants were asked to join a panel, filled out an extensive demographic survey, and were paid small incentives to take part in projects. These panels suffer from three response issues: 1) not everyone is online or online at the same frequency, 2) not everyone who is online wants to be in a panel, and 3) not everyone in the panel will take part in a study. The result is a convenience sample. Good researchers figured out sophisticated ways to handle the sampling challenges that result from panel-based samples, and they work well for most studies. But, in no way are they a random sample.

River sampling is a term often used to describe respondents who are “intercepted” on the Internet and asked to fill out a survey. Potential respondents are invited via online ads and offers placed on a range of websites. If interested, they are typically pre-screened and sent along to the online questionnaire.

Because so much is known about what people are doing online these days, sampling firms have some excellent science behind how they obtain respondents efficiently with river sampling. It can work well, but response rates are low and the nature of the online world is changing fast, so it is hard to get a consistent river sample over time. Nobody being honest would ever use the term “random sampling” when describing river samples.

Panel-based samples and river samples represent how the lion’s share of primary market research is being conducted today. They are fast and inexpensive and when conducted intelligently can approximate the findings of a random sample. They are far from perfect, but I like that the companies providing them don’t promote them as being random samples. They involve some biases and we deal with these biases as best we can methodologically. But, too often we forget that they violate a key assumption that the statistical tests we run require: that the sample is random. For most studies, they are truly “close enough,” but the problem is we usually fail to state the obvious – that we are using statistical tests that are technically not appropriate for the data sets we have gathered.

Which brings us to a newer, shiny object in the research sampling world: ABS samples. ABS (addressed-based samples) are purer from a methodological standpoint. While ABS samples have been around for quite some time, they are just now being used extensively in market research.

ABS samples are based on US Postal Service lists. Because USPS has a list of all US households, this list is an excellent sampling frame. (The Census Bureau also has an excellent list, but it is not available for researchers to use.) The USPS list is the starting point for ABS samples.

Research firms will take the USPS list and recruit respondents from it, either to be in a panel or to take part in an individual study. This recruitment can be done by mail, phone, or even online. They often append publicly-known information onto the list.

As you might expect, an ABS approach suffers from some of the same issues as other approaches. Cooperation rates are low and incentives (sometimes large) are necessary. Most surveys are conducted online, and not everyone in the USPS list is online or has the same level of online access. There are some groups (undocumented immigrants, homeless) that may not be in the USPS list at all. Some (RVers, college students, frequent travelers) are hard to reach. There is evidence that ABS approaches do not cover rural areas as well as urban areas. Some households use post office boxes and not residential addresses for their mail. Some use more than one address. So, although ABS lists cover about 97% of US households, the 3% that they do not cover are not randomly distributed.

The good news is, if done correctly, the biases that result from an ABS sample are more “correctable” than those from other types of samples because they are measurable.

A recent Pew study indicates that survey bias and the number of bogus respondents is a bit smaller for ABS samples than opt-in panel samples.

But ABS samples are not random samples either. I have seen articles that suggest that of all those approached to take part in a study based on an ABS sample, less than 10% end up in the survey data set.

The problem is not necessarily with ABS samples, as most researchers would concur that they are the best option we have and come the closest to a random sample. The problem is that many firms that are providing ABS samples are selling them as “random samples” and that is disingenuous at best. Just because the sampling frame used to recruit a survey panel can claim to be “random” does not imply that the respondents you end up in a research database constitute a random sample.

Does this matter? In many ways, it likely does not. There are biases and errors in all market research surveys. These biases and errors vary not just by how the study was sampled, but also by the topic of the question, its tone, the length of the survey, etc. Many times, survey errors are not the same throughout an individual survey. Biases in surveys tend to be “unknown knowns” – we know they are there, but aren’t sure what they are.

There are many potential sources of errors in survey research. I am always reminded of a quote from Humphrey Taylor, the past Chairman of the Harris Poll who said “On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”  A few years ago, I wrote a post on biases and errors in research, and I was able to quickly name 15 of them before I even had to do an Internet search to learn more about them.

The reality is, the improvement in bias that is achieved by an ABS sample over a panel-based sample is small and likely inconsequential when considered next to the other sources of error that can creep into a research project. Because of this, and the fact that ABS sampling is really expensive, we tend to only recommend ABS panels in two cases: 1) if the study will result in academic publication, as academics are more accepting of data that comes from and ABS approach, and 2) if we are working in a small geography, where panel-based samples are not feasible.

Again, ABS samples are likely the best samples we have at this moment. But firms that provide them are often inappropriately portraying them as yielding random samples. For most projects, the small improvements in bias they provide is not worth the considerable increased budget and increased study time frame, which is why, for the moment, ABS samples are currently used in a small proportion of research studies. I consider ABS to be “state of the art” with the emphasis on “art” as sampling is often less of a science than people think.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.