Archive for August, 2013

The Top 5 Errors and Biases in Survey Research

Halloween Pollster

At its core, market research is simple. We pose questions to a sample of respondents. We take the results and infer what a broader population likely thinks from this sample. So simple, yet why is it that it goes wrong so often?

Because there are many potential sources of errors and biases in surveys, some of which are measureable and many others of which creep into our projects without anyone noticing.

Years ago, Humphrey Taylor (Chairman of the Harris Poll) offered a particularly shocking quote to our industry:

On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”

Infinite errors?

When organizing this post, I jotted down every type of error and bias in surveys that I could remember. In 10 minutes, I could name 20 potential sources of error. After toying around with an Internet search, this list grew to 40. Any one of these errors could have “infinite” consequences to the accuracy of a poll or research project. Or, they might not matter at all.

I thought I would organize errors and biases into a “top 5.” These are based on about 25 years’ experience in the research and polling industry and seem to be the types of errors and biases we see the most often and are most consequential.

The Top 5

1.  Researcher Bias.

The most important error that creeps into surveys about isn’t statistical at all and is not measurable. The viewpoint of the researcher has a way of creeping into question design and analysis. Some times this is purposeful, and other times it is more subtle. All research designers are human, and have points-of-view. Even the most practiced and professional researchers can have subtle biases in the way they word questions or interpret results. How we frame questions and report results is always affected by our experiences and viewpoints – which can be a good thing, but can also affect the purity of the study.

2. Poor match of the sample to the population.

This is the source of some of the most famous errors in polling. Our industry once predicted the elections of future Presidents Alf Landon and Thomas Dewey based on this mistake. It is almost never the case that the sampling frame you use is a perfect match to the population you are trying to understand, so this error is present on most studies. You can sometimes recover from asking the wrong questions, but you can never recover from asking them of the wrong people

Most clients (and suppliers) like to focus on questionnaire development when a new project is awarded. The reality is the sampling and weighting plan is every bit as consequential to the success of the project, and rarely gets the attention it deserves. We can tell when we have a client that really knows what they are doing if they begin the project by focusing on sampling issues and not jumping to questionnaire design.

3. Lack of randomness/response bias.

Many surveys proceed without random samples. In fact, it is rare that a survey being done today can accurately claim to be using a random sample. Remember those statistics courses you took in college and graduate school? The one thing they have in common is pretty much everything they taught you statistically is only relevant if you have a random sample. And, odds are great that you don’t.

A big source of “non-randomness” in a sample is response bias. A typical RDD phone survey being conducted today has a cooperation rate of less than 20%. 10% is considered a good response rate from an online panel. When we report results of these studies, we are assuming that the vast majority of people who didn’t respond would have responded in the same way as those who did. Often, this is a reasonable assumption. But, sometimes it is not. Response bias is routinely ignored in market research and polls because it is expensive to correct (the fix involves surveying the non-responders).

4.  Failure to quota sample or weight data.

This is a bit technical. Even if we sample randomly, it is typical for some subgroups to be more willing to cooperate than others. For example, females are typically less likely to refuse a survey invitation than males, and minorities are less likely to participate than whites. So, a good researcher will quota sample and weight data to compensate for this. In short, if you know something about your population before you survey them, you should use this knowledge to your advantage. If you are conducting an online poll and you are not doing something to quota sample or weight the data, odds are very good that you are making an important mistake.

5.  Overdoing it.

I have worked with methodologists who have more degrees than a thermometer, think about the world in Greek letters, and understand every type of bias we can comprehend. I have also seen them concentrate so much on correcting for every type of error they can imagine that they “overcook” the data. I remember once passing off a data set to a statistician, who corrected for 10 types of errors, and the resulting data set didn’t even have the gender distribution it the proper proportion.

Remember — you don’t have to correct for an error or bias unless it has an effect on what you are asking.  For example, if men and women answer a question identically, weighting by gender will have no effect on the study results. Instead, you should know enough about the issues you are studying to know what types of errors are likely to be relevant to your study.

So that is our top 5. Note that I did not put sampling error in the top 5. I am not sure it would make my top 20. Sampling error is the “+/- 5%” that you see attached to many polls. We will do a subsequent blog post on why this isn’t a particularly relevant error for most studies. It just happens to be the one type of error that can be easily calculated mathematically, which is why we see it cited so often. I am more concerned about the errors that are harder to calculate, or, more importantly, the ones that go unnoticed.

With 40+ sources of errors, one could wonder how our industry ever gets it right. Yet we do. More than $10 Billion is spent on research and polling in the US each year, and if this money was not being spent effectively, the industry would implode. So, how do we get it right?

In one sense, many of the errors in surveys tend to be randomly distributed. For instance, there can be a fatigue bias in a question involving a long list of items to be assessed. By presenting long lists in a randomized order we can “randomize” this error – we don’t remove it.

In some sense, errors and biases also seem to have a tendency to cancel each other out, rather than magnify each other. And, as stated above, not all errors matter to every project. The key is to consider which ones might before the study is fielded.