Posts Tagged 'Weighting'

The Top 5 Errors and Biases in Survey Research

Halloween Pollster

At its core, market research is simple. We pose questions to a sample of respondents. We take the results and infer what a broader population likely thinks from this sample. So simple, yet why is it that it goes wrong so often?

Because there are many potential sources of errors and biases in surveys, some of which are measureable and many others of which creep into our projects without anyone noticing.

Years ago, Humphrey Taylor (Chairman of the Harris Poll) offered a particularly shocking quote to our industry:

On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”

Infinite errors?

When organizing this post, I jotted down every type of error and bias in surveys that I could remember. In 10 minutes, I could name 20 potential sources of error. After toying around with an Internet search, this list grew to 40. Any one of these errors could have “infinite” consequences to the accuracy of a poll or research project. Or, they might not matter at all.

I thought I would organize errors and biases into a “top 5.” These are based on about 25 years’ experience in the research and polling industry and seem to be the types of errors and biases we see the most often and are most consequential.

The Top 5

1.  Researcher Bias.

The most important error that creeps into surveys about isn’t statistical at all and is not measurable. The viewpoint of the researcher has a way of creeping into question design and analysis. Some times this is purposeful, and other times it is more subtle. All research designers are human, and have points-of-view. Even the most practiced and professional researchers can have subtle biases in the way they word questions or interpret results. How we frame questions and report results is always affected by our experiences and viewpoints – which can be a good thing, but can also affect the purity of the study.

2. Poor match of the sample to the population.

This is the source of some of the most famous errors in polling. Our industry once predicted the elections of future Presidents Alf Landon and Thomas Dewey based on this mistake. It is almost never the case that the sampling frame you use is a perfect match to the population you are trying to understand, so this error is present on most studies. You can sometimes recover from asking the wrong questions, but you can never recover from asking them of the wrong people

Most clients (and suppliers) like to focus on questionnaire development when a new project is awarded. The reality is the sampling and weighting plan is every bit as consequential to the success of the project, and rarely gets the attention it deserves. We can tell when we have a client that really knows what they are doing if they begin the project by focusing on sampling issues and not jumping to questionnaire design.

3. Lack of randomness/response bias.

Many surveys proceed without random samples. In fact, it is rare that a survey being done today can accurately claim to be using a random sample. Remember those statistics courses you took in college and graduate school? The one thing they have in common is pretty much everything they taught you statistically is only relevant if you have a random sample. And, odds are great that you don’t.

A big source of “non-randomness” in a sample is response bias. A typical RDD phone survey being conducted today has a cooperation rate of less than 20%. 10% is considered a good response rate from an online panel. When we report results of these studies, we are assuming that the vast majority of people who didn’t respond would have responded in the same way as those who did. Often, this is a reasonable assumption. But, sometimes it is not. Response bias is routinely ignored in market research and polls because it is expensive to correct (the fix involves surveying the non-responders).

4.  Failure to quota sample or weight data.

This is a bit technical. Even if we sample randomly, it is typical for some subgroups to be more willing to cooperate than others. For example, females are typically less likely to refuse a survey invitation than males, and minorities are less likely to participate than whites. So, a good researcher will quota sample and weight data to compensate for this. In short, if you know something about your population before you survey them, you should use this knowledge to your advantage. If you are conducting an online poll and you are not doing something to quota sample or weight the data, odds are very good that you are making an important mistake.

5.  Overdoing it.

I have worked with methodologists who have more degrees than a thermometer, think about the world in Greek letters, and understand every type of bias we can comprehend. I have also seen them concentrate so much on correcting for every type of error they can imagine that they “overcook” the data. I remember once passing off a data set to a statistician, who corrected for 10 types of errors, and the resulting data set didn’t even have the gender distribution it the proper proportion.

Remember — you don’t have to correct for an error or bias unless it has an effect on what you are asking.  For example, if men and women answer a question identically, weighting by gender will have no effect on the study results. Instead, you should know enough about the issues you are studying to know what types of errors are likely to be relevant to your study.

So that is our top 5. Note that I did not put sampling error in the top 5. I am not sure it would make my top 20. Sampling error is the “+/- 5%” that you see attached to many polls. We will do a subsequent blog post on why this isn’t a particularly relevant error for most studies. It just happens to be the one type of error that can be easily calculated mathematically, which is why we see it cited so often. I am more concerned about the errors that are harder to calculate, or, more importantly, the ones that go unnoticed.

With 40+ sources of errors, one could wonder how our industry ever gets it right. Yet we do. More than $10 Billion is spent on research and polling in the US each year, and if this money was not being spent effectively, the industry would implode. So, how do we get it right?

In one sense, many of the errors in surveys tend to be randomly distributed. For instance, there can be a fatigue bias in a question involving a long list of items to be assessed. By presenting long lists in a randomized order we can “randomize” this error – we don’t remove it.

In some sense, errors and biases also seem to have a tendency to cancel each other out, rather than magnify each other. And, as stated above, not all errors matter to every project. The key is to consider which ones might before the study is fielded.

How weighting data is like playing with Silly Putty


Remember Silly Putty?  It was a really popular toy back in the day.  It could do all sorts of things.  It bounced.  It could be used to glue two things together.  It was actually used by astronauts to secure their tools while in space.  If you left it alone on a warm day it would melt.  And, it was darn near impossible to get out of your clothing and hair.

The coolest property of Silly Putty was that if you flattened it and pressed it against a newspaper, it would transfer the image to the Silly Putty.  For those of you that remember that, we have an analogy to weighting of survey data to share.  Bear with us, this is a bit of a stretch. 🙂

Imagine your survey data set is a flattened handful of Silly Putty.  Your task is to faithfully represent a one-panel comic from the newspaper with it.  If your survey sample is plentiful and covers the image perfectly, this just requires that you are careful as you press it against the comic.  Voila, you’ve represented your universe perfectly!  (Okay, we know it will be a mirror image, but ignore that!)

However, this isn’t really how it worked with Silly Putty or how it works with survey data.  What tended to happen was you didn’t have quite enough Putty to flatten onto the newspaper, or you didn’t quite cover the entire comic with it.  So, you spread it out as best you could.  Then, when you had lifted the image, you stretched the putty a bit to try to make it look like the original.  The problem was that if you stretched the putty in one direction, there tended to be a contraction of it in another.

That is analogous to what we are doing when we try to make a non-random sample match a universe.  We may be lacking enough putty (not enough sample size) or might not be able to get it to perfectly cover the picture (we under-represent some groups).  Through careful weighting (stretching the putty) we can usually get an imperfect, but accurate enough representation of the universe (the image).  If we weight (stretch the putty) too much, we distort the universe (the image).  (That can be really funny with Silly Putty, but it isn’t so funny with research data.)

As “silly” as this sounds, we have found it to be a useful analogy for clients.  Clients often push us to weight data too much.   This is like stretching the Silly Putty so much that you can’t recognize the picture any more.  Well thought out adjustments can make sense if we know what we are shooting for.  We need to know what the universe looks like, just as the Silly Putty user needs to know what the image he/she is seeking to represent looks like.  Stretching it in the dark doesn’t meet with good results.  And, when we weight one group (stretch the putty in one direction), it has the effect of distorting another (contract the putty in another direction).

Weighting is best when we are making subtle adjustments that improve the picture.  Because we almost never have a random sample, it is necessary.  But it can be overdone, and we have to be careful not to stretch the Silly Putty too far.

Visit the Crux Research Website

Enter your email address to follow this blog and receive notifications of new posts by email.