Archive for the 'Address Based Samples' Category

The myth of the random sample

Sampling is at the heart of market research. We ask a few people questions and then assume everyone else would have answered the same way.

Sampling works in all types of contexts. Your doctor doesn’t need to test all of your blood to determine your cholesterol level – a few ounces will do. Chefs taste a spoonful of their creations and then assume the rest of the pot will taste the same. And, we can predict an election by interviewing a fairly small number of people.

The mathematical procedures that are applied to samples that enable us to project to a broader population all assume that we have a random sample. Or, as I tell research analysts: everything they taught you in statistics assumes you have a random sample. T-tests, hypotheses tests, regressions, etc. all have a random sample as a requirement.

Here is the problem: We almost never have a random sample in market research studies. I say “almost” because I suppose it is possible to do, but over 30 years and 3,500 projects I don’t think I have been involved in even one project that can honestly claim a random sample. A random sample is sort of a Holy Grail of market research.

A random sample might be possible if you have a captive audience. You can random sample some the passengers on a flight or a few students in a classroom or prisoners in a detention facility. As long as you are not trying to project beyond that flight or that classroom or that jail, the math behind random sampling will apply.

Here is the bigger problem: Most researchers don’t recognize this, disclose this, or think through how to deal with it. Even worse, many purport that their samples are indeed random, when they are not.

For a bit of research history, once the market research industry really got going the telephone random digit dial (RDD) sample became standard. Telephone researchers could randomly call land line phones. When land line telephone penetration and response rates were both high, this provided excellent data. However, RDD still wasn’t providing a true random, or probability sample. Some households had more than one phone line (and few researchers corrected for this), many people lived in group situations (colleges, medical facilities) where they couldn’t be reached, some did not have a land line, and even at its peak, telephone response rates were only about 70%. Not bad. But, also, not random.

Once the Internet came of age, researchers were presented with new sampling opportunities and challenges. Telephone response rates plummeted (to 5-10%) making telephone research prohibitively expensive and of poor quality. Online, there was no national directory of email addresses or cell phone numbers and there were legal prohibitions against spamming, so researchers had to find new ways to contact people for surveys.

Initially, and this is still a dominant method today, research firms created opt-in panels of respondents. Potential research participants were asked to join a panel, filled out an extensive demographic survey, and were paid small incentives to take part in projects. These panels suffer from three response issues: 1) not everyone is online or online at the same frequency, 2) not everyone who is online wants to be in a panel, and 3) not everyone in the panel will take part in a study. The result is a convenience sample. Good researchers figured out sophisticated ways to handle the sampling challenges that result from panel-based samples, and they work well for most studies. But, in no way are they a random sample.

River sampling is a term often used to describe respondents who are “intercepted” on the Internet and asked to fill out a survey. Potential respondents are invited via online ads and offers placed on a range of websites. If interested, they are typically pre-screened and sent along to the online questionnaire.

Because so much is known about what people are doing online these days, sampling firms have some excellent science behind how they obtain respondents efficiently with river sampling. It can work well, but response rates are low and the nature of the online world is changing fast, so it is hard to get a consistent river sample over time. Nobody being honest would ever use the term “random sampling” when describing river samples.

Panel-based samples and river samples represent how the lion’s share of primary market research is being conducted today. They are fast and inexpensive and when conducted intelligently can approximate the findings of a random sample. They are far from perfect, but I like that the companies providing them don’t promote them as being random samples. They involve some biases and we deal with these biases as best we can methodologically. But, too often we forget that they violate a key assumption that the statistical tests we run require: that the sample is random. For most studies, they are truly “close enough,” but the problem is we usually fail to state the obvious – that we are using statistical tests that are technically not appropriate for the data sets we have gathered.

Which brings us to a newer, shiny object in the research sampling world: ABS samples. ABS (addressed-based samples) are purer from a methodological standpoint. While ABS samples have been around for quite some time, they are just now being used extensively in market research.

ABS samples are based on US Postal Service lists. Because USPS has a list of all US households, this list is an excellent sampling frame. (The Census Bureau also has an excellent list, but it is not available for researchers to use.) The USPS list is the starting point for ABS samples.

Research firms will take the USPS list and recruit respondents from it, either to be in a panel or to take part in an individual study. This recruitment can be done by mail, phone, or even online. They often append publicly-known information onto the list.

As you might expect, an ABS approach suffers from some of the same issues as other approaches. Cooperation rates are low and incentives (sometimes large) are necessary. Most surveys are conducted online, and not everyone in the USPS list is online or has the same level of online access. There are some groups (undocumented immigrants, homeless) that may not be in the USPS list at all. Some (RVers, college students, frequent travelers) are hard to reach. There is evidence that ABS approaches do not cover rural areas as well as urban areas. Some households use post office boxes and not residential addresses for their mail. Some use more than one address. So, although ABS lists cover about 97% of US households, the 3% that they do not cover are not randomly distributed.

The good news is, if done correctly, the biases that result from an ABS sample are more “correctable” than those from other types of samples because they are measurable.

A recent Pew study indicates that survey bias and the number of bogus respondents is a bit smaller for ABS samples than opt-in panel samples.

But ABS samples are not random samples either. I have seen articles that suggest that of all those approached to take part in a study based on an ABS sample, less than 10% end up in the survey data set.

The problem is not necessarily with ABS samples, as most researchers would concur that they are the best option we have and come the closest to a random sample. The problem is that many firms that are providing ABS samples are selling them as “random samples” and that is disingenuous at best. Just because the sampling frame used to recruit a survey panel can claim to be “random” does not imply that the respondents you end up in a research database constitute a random sample.

Does this matter? In many ways, it likely does not. There are biases and errors in all market research surveys. These biases and errors vary not just by how the study was sampled, but also by the topic of the question, its tone, the length of the survey, etc. Many times, survey errors are not the same throughout an individual survey. Biases in surveys tend to be “unknown knowns” – we know they are there, but aren’t sure what they are.

There are many potential sources of errors in survey research. I am always reminded of a quote from Humphrey Taylor, the past Chairman of the Harris Poll who said “On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”  A few years ago, I wrote a post on biases and errors in research, and I was able to quickly name 15 of them before I even had to do an Internet search to learn more about them.

The reality is, the improvement in bias that is achieved by an ABS sample over a panel-based sample is small and likely inconsequential when considered next to the other sources of error that can creep into a research project. Because of this, and the fact that ABS sampling is really expensive, we tend to only recommend ABS panels in two cases: 1) if the study will result in academic publication, as academics are more accepting of data that comes from and ABS approach, and 2) if we are working in a small geography, where panel-based samples are not feasible.

Again, ABS samples are likely the best samples we have at this moment. But firms that provide them are often inappropriately portraying them as yielding random samples. For most projects, the small improvements in bias they provide is not worth the considerable increased budget and increased study time frame, which is why, for the moment, ABS samples are currently used in a small proportion of research studies. I consider ABS to be “state of the art” with the emphasis on “art” as sampling is often less of a science than people think.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.