Posts Tagged 'Biases and errors'



10 Tips to Writing an Outstanding Questionnaire

I have written somewhere between a zillion and a gazillion survey questions in my career. I am approaching 3,000 projects managed or overseen and I have been the primary questionnaire author on at least 1,000 of them.  Doing the math, if an average questionnaire is 35 questions long, it means I have written or overseen 35,000+ survey questions. That is 25 questions a week for 26 years!

More importantly, I’ve had to analyze the results of these questions, which is where one really starts to understand if they worked or not.

I started in the (land line) telephone research days. Back then, it was common practice for questionnaire authors to step into the phone center to conduct interviews during the pre-test or first interview day.  While I disliked doing this, the experience served as the single best education on how to write a survey question I could have had.  I quickly understood if a question was working, was understood by the respondent, etc. It was a trial by fire and in addition to discovering that I don’t have what it takes to be a telephone interviewer I quickly learned what was and wasn’t working with the questions I was writing.

Something in this learning process is lost in today’s online research world. We never really experience first-hand the struggles a respondent has with our questions and thus don’t get to apply this to the next study.  For this reason I am thankful I started in the halcyon days of telephone research. Today’s young researchers don’t have the opportunity to develop these skills in the same way.

There are many guides to writing survey questions out there that cover the basics. Here I thought I’d take a broader view and list some of the top things to keep in mind when writing survey questions.  These are things I wish I had discovered far earlier!

  1. Begin with the end in mind. This concept is straight out of the 7 Habits of Highly Effective People and is central to questionnaire design.  Good questionnaire writers are thinking ahead to how they will analyze the resulting data.  In fact, I have found that if this is done well, writing the research report becomes straightforward.  I have also discovered that when training junior research staff it is always better to help them develop their report writing skills first and then move to questionnaire development.  Once you are an apt report writer questionnaire writing flows naturally because it begins with the end in mind.  It is also a reason why most good analysts/writers run from situations where they have to write a report from a questionnaire someone else has written.
  2. Start with an objective list. We start with a clear objective list the client has signed off on. Every question should be tied to the objective list or it doesn’t make it in the questionnaire. This is an excellent way to manage clients who might have multiple people providing input. It helps them prioritize. Most projects that end up not fully satisfying clients are ones where the objectives weren’t clear or agreed upon at the onset.
  3. Keep it simple – ridiculously simple. One of the most fortuitous things that happened to me in my career is that for a few years I exclusively wrote questionnaires that were intended for young respondents.  When I went back to writing “adult” survey questions I didn’t change a thing as I realized that what works for a 3rd grader is short, clear, unambiguous questions with one possible outcome.  The same thing is true for adults.
  4. Begin with a questionnaire outline. Outlines are easier to work through with clients than questionnaires. The outlines keep the focus on the types of questions we are asking and keep us from dwelling on the precise wording or scales. Writing the outline is actually more difficult than writing the questionnaire.
  5. Use consistent scales. Try not to use more than 2-3 scale types on the same questionnaire as it is confusing to the respondents.
  6. Don’t write long questions. There is evidence that respondents don’t read them. You are better off being more wordy in the answer choices than in the question itself, as online many respondents just look at the answer choices and don’t  even read the question you spent hours tweaking the wording on.
  7. Don’t get cute. We have a software system that allows us to do all sorts of sexy things, like drag-and-drop, slider scales, etc.  We rarely use them, as there is evidence that the bells and whistles are distracting and good old fashion pick lists and radio buttons provide more reliable measures.
  8. Consider mobile. From major research panels, the percentage of respondents answering on mobile devices is just 15% or so currently, but that is rapidly changing. Not only does your questionnaire have to work on the limited screen real estate of a mobile device, but it also is increasingly less likely to be answered by someone tethered to a desktop and laptop screen in a situation where you have their attention.  Your questionnaires are soon going to be answered by people multitasking, walking the dog, hanging with friends, etc.  This context needs to be appreciated.
  9. Ask the question you are getting paid to ask. Too many times I see questionnaires that dance around the main issue of the study without ever directly asking the respondent the central question. While it is nice to back into some issues with good data analysis skills, there is no substitute to simply asking direct questions. We also see questionnaires that allow too many “not sure/no opinion” type options. You are getting paid to find out what the target audience’s opinion is, so if this seems like a frequent response you have probably not phrased the question well.
  10. Think like a respondent and not a client. This is perhaps the most important advice I can give. The respondent doesn’t live and breathe the product or service you are researching like you client does. Survey writers must appreciate this context and ask questions that can be answered. There is a saying that if you “ask a question you will get an answer” – but that is no indication that the respondent understood your question or viewed it in the same context as you client.

Anecdotally, I have found that staff with the strongest data analytics skills and training can be some of the poorest questionnaire writers. I think that is because they can deploy their statistical skills on the back end to make up for their questionnaire writing deficiencies. But, across 3,000 projects I would say less than 100 of them truly required statistical skills beyond what you might learn in the second stats course you take in college. It really isn’t about statistical skills; it is more about translating study objectives into language a target audience can embrace.

Good questionnaire writing is not rocket science (but it is brain surgery). Above all, seek to simplify and not to complicate.

At least 10% of your online respondents are faking it!

Online surveying has become the dominant method of quantitative data collection in the market research industry in a short time. Concerns over the viability of online sampling are valid, yet on the whole the MR industry has done an excellent job of addressing them.

However, there is a data quality issue that is rarely discussed that researchers haven’t found a standard way of dealing with just yet:  many of your online respondents are faking it.

They may be doing so for a number of reasons. Your questionnaire may be poorly designed, difficult to understand, or just too long. Your incentive system might be working at cross purposes by getting respondents to respond dishonestly to screening questions or speed along through the questionnaire to get the incentive you offer. Your respondents could be distracted by other things going on in their homes or other windows open on his/her desktop. If they are answering on a mobile device, all sorts of things might be competing with your survey for their attention.

These respondents go by various names:  speeders, satisficers, straightliners. Our experience is that they constitute minimally 10% of all respondents to a study and sometimes as much as 20%.

So, what can we do about this?

One way to deal with this issue is to assume it away. This is likely what most researchers do – we know we have issues with respondents who are not considering their answers carefully, so we assume the errors associated with their responses are randomly distributed. In many projects that might actually work, but why wouldn’t we try to fix data quality issues that we know exist?

At Crux, we try to address issues of data quality at two stages:  1) when we design the questionnaires and 2) after data are collected.

Questionnaire design is a root cause of the issue. Questions must be relevant to the respondent, easy to understand, and answer choices need to be unambiguous. Grid questions should be kept to a minimum. Questionnaires should have a progress bar indicating how much of the questionnaire is left to complete and the respondent should be given an indication of how much time the questionnaire is expected to take. The incentive system should be well thought out, and not so “rich” as to cause unintended behaviors.

The survey research industry eventually must come to a realization that we are torturing our respondents. Our questionnaires are too long, too complicated, and ask a lot of questions that are simply not answerable.  It may sound oversimplified, but we try hard to think like a respondent when we compose a questionnaire. We try to keep questions short, with unambiguous answers, and to keep scales consistent throughout the questionnaire.

Another issue to contend with is where to get your respondents. This issue is particularly pronounced with online intercepts or “river” sample, and is less prevalent with standing respondent panels or with client customer lists.  But, even with large and respected online panels the percent of “faking” respondents can vary dramatically.  When evaluating panels, ask the supplier what they do about this issue, and if they don’t have a ready answer, be prepared to walk away.

Beyond panel recruitment and questionnaire design, there are other adjustments than should be made to the resulting sample.  Begin by over-recruit every study by at least 10%, as you should anticipate having to remove at least that many respondents.

“Speeders” are fairly easy to identify. We tend to remove anyone who completed the survey in less than half of the median time from the data base.  It is important to use median time as the benchmark measure and not the mean, as some respondents can often be logged as taking a very long time to complete a survey (if, for instance, they start the survey, decide to eat dinner, and then come back to it).

Straightline checks are more challenging. Some suppliers will remove respondents who complete a large grid by providing the same answer for all items.  We feel it is better to take a more sophisticated approach.  We look at the variability in answers (the standard deviation) across all grid items in the study.  A respondent who demonstrates little variability compared to the rest of the sample is targeted for further review.

Recently, we have been adding in some questions that help demonstrate that the respondent is reading the questions. So, for instance, a question might ask a respondent to choose the third item in a long list, or to choose a particular word from a list.

Another effective technique is, in a grid list, include a couple of items that are worded negatively compared to other items. For instance, if you have a long grid using a Likert scale, you might ask respondents to react to opposite statements such as “This company provides world-class service” and “This company provides the worst service ever” in the same grid. If you get the same answer to each, it is evidence to look further, as clearly the respondent is being inconsistent.

All of this takes time and effort, and, to be honest, it will probably only catch the most egregious offenders. Our experience is you will remove about 10% of your respondent base.

The MR industry has efforts in place (mainly via our trade associations) to develop standard ways of dealing with these issues. Until these standards are developed, tested, and accepted, it is important for clients to recognize these issues exist, and for suppliers to take the time to address them.

Ask the right questions, but of the right people!

Market research and polling is a $10 billion + industry in the US alone. It employs some of the sharpest statisticians and methodological minds that our universities produce. Yet, what is clear is this is an industry that still makes many high profile mistakes.

Here is something we’ve gleaned over the years: When you ask the wrong question on a survey or forget to ask a question, you actually have a reasonable chance to recover from the mistake, provided you notice it. You can change your interpretation of the result based on your imperfect question, you can look to other questions you asked to see if they shed light on the result, and you can look for a secondary source for the information. Not an ideal situation for sure, but often asking a poor question can be a recoverable error. We’ve yet to see the perfect questionnaire or even the perfect question, so in some sense we deal with this issue on every project.

However, if you ask the question of the wrong people, there is usually nothing you can do to correct the error, short of repeating the study. Failure to spot this type of mistake can lead to poor guidance to clients. Posing questions to the wrong audience might well be the most disastrous thing market researchers can do.

And it happens. Early in my career, I remember our firm conducting a phone study where an improper randomization technique occurred which caused us to call one half of a town and not the other. We literally missed everyone living on the “other side of the tracks” who had different opinions of the issues we were covering. The only solution was to repeat the entire project.

More common than actual sample coverage errors is failing to interview the correct individual. This is an ongoing challenge for business-to-business studies. Clients want to talk to a “decision maker,” yet often the decision maker and the user of the product are not the same individual. So, studies end up gathering detailed information on how to improve a product from people who don’t actually use it.

We conduct quite a bit of research with youth audiences. It is common for our clients to ask us to conduct what we call “hearsay” studies. They will ask us, for example, to interview parents and ask them what their kids think about an issue.  Why do that when we can ask the child directly?  We should always be asking questions that our respondents are screened to be the best individuals to answer.

We often see that seasoned researchers or astute up-and-comers realize the importance of sampling and screening.  Which is why, upfront, they dwell more on the sampling aspects of a project than on questionnaire topics and wording.  We always know we are dealing with a client who knows what they are doing if they concentrate the initial conversations on who we want to reach and how we want to reach them, and wait to discuss what we specifically want to ask.

Writing a good questionnaire is just like brain surgery

In the 1990’s I was working on a project for a consumer packaged goods client in a “low-involvement” category. The product was inexpensive (at the time it sold for less than $2) and ubiquitous – the category was owned by more than 90% of U.S. households. But, it was also mundane and sold mainly on a price basis. More than two-thirds of the category volume was private label. For all intents, the category was a commodity.

Our client wanted to delve deeply into the consumer’s mindset when buying and using the product. They had devised a list of 36 product attributes and our task was to discover which of these differentiated their product from the competition and drove sales. This is a common project type, but it was entirely unworkable for this particular study.

The reason? The product was so low-involvement and inexpensive that customers really never thought about it, let alone its performance on 36 nuanced characteristics. I personally hadn’t ever heard of at least half of the attributes despite using the product my entire life. We were asking consumers to differentiate traits they had never considered in advance.

We proceeded to build a questionnaire and conduct a study, and predictably found that the 36 items were all highly correlated with each other.  We applied some statistical wizardry (factor analysis) to demonstrate that essentially, consumer option on the category came down to if they could recognize the brand and how much it cost. In effect, there were really only two questions to ask, yet we had asked 36.

It took me quite a while to understand why our study had failed. It really came down to understanding that writing a good question is brain surgery without the mess.  One way to view the questionnaire writing task is recognizing we are trying to get inside the respondent’s brain and retrieve an opinion.  This works very well for high involvement decisions and for issues where a respondent is likely to have already formulated an opinion before we survey them.  Suppose we want to find out how much someone likes their job, or how they view their local school district, or what color they feel the sky is.  For the most part, we are conducting simple brain surgery – going inside their brain and via a carefully worded question, plucking out an established opinion.  It works very well.

But, with low involvement items, there is nothing there to retrieve.  People just haven’t thought about 36 different buying attributes for low involvement products.  We are asking them to figure out what the attribute is, formulate an opinion, and express it to us in about 10 seconds time.  They will provide an answer, but there will be an enormous amount of error involved.

In short, when you ask a question, you will get an answer. That doesn’t mean that answer will be meaningful or even accurate.  Low-involvement products are low-risk and involve little consequence of making a “wrong” decision.  Consumers apply more heuristic approaches in these situations.

For the most part, the more we try to retrieve already established thoughts on surveys, the more accurate and useful our data are.  This doesn’t mean we cannot research low-involvement products, but it does imply you have to pose questions a respondent can actually answer. Sometimes this means the questioning has to be simpler, or expressed in a clear choice task, or that we need to move to experimental designs.

As researchers, we have to understand that consumers’ lives are hectic and settling on a limited number of easily comprehensible decision criteria for low-involvement items is how the consumer world works.  In the end, I think if consumers really contemplated 36 attributes in the real world, the product would have sold for a much higher price. It just wasn’t worth their time to consider a $2 purchase in this much detail.

 

 

 

 

Is this study significant?

statistics

In all of our dealings with clients, there is one question that we wince at:  “is this study statistically significant?”

This question is cringeworthy because it is challenging to understand what is meant by the question and the answer is never a simple “yes” or “no.”

The term “significant” has a specific meaning in statistics which is not necessarily its meaning in everyday English. When we say something is significant in our daily lives, we tend to imply that it is meaningful, important, or worthy of attention. In a statistical context, the meaning of the term is narrow:  it just means that there is a high probability our findings are not due to chance.

For instance, suppose we conduct a study and find that 55% of women and 50% of men prefer Coke over Pepsi. Researchers will tend to say that there is a statistically significant difference in the Coke/Pepsi gender preference as long as there is a 95% probability or better than these two numbers are different.

I won’t bore you with the statistical calculation, but in this case we would have had to interview about 1,000 women and 1,000 men in order to highlight this difference as being significant.

But, just because these two findings are statistically significant, doesn’t necessarily imply that they are practically important.  Whether or not a 5 point difference between men and women is something worth noting is really a more qualitative issue. Is that a big enough difference to matter? All we can really say as researchers is that yep, odds are pretty good the two numbers are different.

And that is where the challenge lies. Statistical significance and practical importance are not necessarily the same thing. Statistical significance is calculated mainly by knowing the sample size and the variance of response.  The more people you interview and the more they tend to have the same answers, the easier it is to find statistically significant differences.

The custom is to only highlight differences with a 95% or greater probability of being not being due to chance. But this is nothing more than a tradition. There is no reason not to highlight differences with a greater or less probability.  In a sense, every study that is implemented well is provides statistically significant results – it just depends on how much of a chance you are willing to take of making an error.

I recently had a client ask me what it would take to have a 0% chance of being wrong. The short answer is you would have to interview everybody in the population. And do it in a perfect, no-biasing way.

So, the correct answer to “is this study significant” is “it depends on how certain you want to be.” That is rarely a satisfying response, which is why we don’t like the question to begin with!

The Best Graph Ever

Marketers tend to be obsessed with graphs. A challenge for many research projects is determining how to best distill statistics gathered from hundreds of respondents into a simple picture that makes a convincing point. A good graph balances a need for simplicity with an appreciation for the underlying complexity of the data.

Recently, as part of a year-end series, the Washington Post has been unveiling its “Graphs of the Year.” The Post has been inviting its contributing “wonks” to choose one graph that best encapsulates 2013 for them. I found myself spending way too much time looking through them. Some of the graphs are truly outstanding summaries of a key issue – and their conclusions are striking. Others show the political biases of the wonks themselves, and show how data can indeed be manipulated to make a point. If I were to teach a class in market research, an entire lesson would be devoted to these graphs.

I judge graphs by a simple criterion:  If you were carrying a deck of graphs down the hallway and one fell onto the floor, would someone who picked it up be able to understand its main point, without any other context? We try our best to draw graphs that meet this threshold.

In the end, the good graphs from the Post are those that spur thought and are ideologically independent. In particular, I like Bill Gates’ graph which shows the causes of death in the world as well as how each is increasing or decreasing. This isn’t a simple graph, but it clearly shows the progress the world is making and priorities for the future.

Some graphs didn’t do it for me. Senator Wydens’ graph reminded me of a David Ogilvy quotation: “They use [research] as a drunkard uses a lamp post —  for support, rather than for illumination.” Wyden’s graph came off as a platform to make a political point. It confused me and I didn’t see how the conclusions he suggests flow from the graph at all.

Senator Patty Murray’s graph may very well make a valid point about what drives the federal deficit, but it shows a shocking example of correlation and causation not being the same thing. Just because two lines are displayed next to each other does not mean one leads to another, or in this case, does not mean they are not correlated. Her explanation of the graph is political and as far as I can tell the graph not only doesn’t illuminate her point. The graph doesn’t seem to make any point at all.

Perhaps the most misleading graph comes from Peter Thiel. His graph indicates that as student loan debt has increased, the median income of households with a bachelor’s degree has declined. The problem? The two lines on the graph are on different scales! The median income line is shown per household. The student debt load line is across the entire population. They aren’t comparable.

To make sense, the student debt load should instead be shown on a per household basis. College enrollments have increased steadily over the time frame of the graph, so of course debt load in total will be increasing. And, as a wider base of students pursue degrees, it would be expected that median income might be impacted downward. This is not to say that student debt is not an important issue – as it clearly is. But, this graph seems to take the focus off student debt and indicate that college education does not pay off.

So, what would I have picked as my graph of the year? Actually, I can do it one better. I have a graph that I consider to be the best graph of all time.  It comes from Gallup and is shown in the graph below. (It is best to go to Gallup’s site and click on “historical trend” to view this graph, which shows up-to-date tracking through Obama.)

approvalratingspast

This graph shows the Presidential approval rating tracked since modern polling began. It begins with Harry Truman and on Gallup’s site it runs right up to Obama’s current numbers.

I like it because it is a clear and consistent measure over a long period of time. To me, it is fascinating to look at with a mind towards what was going on historically as the polls were taken. It shows how memories can change as events move to the past. For instance, G.W. Bush’s approval rating was nearly the highest ever measured early in his Presidency (just after 9/11) and moved to one of the lowest ever measured by the time he left office. Clinton is the only President whose approval rating displays a positive trend throughout his term in office. Kennedy’s approval rating was moderate by historical standards near the end of his time in office.

To me, it is fascinating to think of a historical event and then look at the chart to see what happened to approval ratings.Watergate preceded a large drop in Nixon’s ratings, and Ford’s pardon of Nixon did the same. Once WWII was over and the US became mired in Korea Truman’s popularity took a huge hit. Eisenhower’s ratings were very stable compared to others.

All Presidents start office with their approval ratings at their highest. It seems that the first day on the job is the best day. Which may be why the first 100 days is always considered key to any Presidential agenda.

Although graphs may be best judged by their ability to convey one thought, I find that I can stare at this one for hours.

The Top 5 Errors and Biases in Survey Research

Halloween Pollster

At its core, market research is simple. We pose questions to a sample of respondents. We take the results and infer what a broader population likely thinks from this sample. So simple, yet why is it that it goes wrong so often?

Because there are many potential sources of errors and biases in surveys, some of which are measureable and many others of which creep into our projects without anyone noticing.

Years ago, Humphrey Taylor (Chairman of the Harris Poll) offered a particularly shocking quote to our industry:

On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”

Infinite errors?

When organizing this post, I jotted down every type of error and bias in surveys that I could remember. In 10 minutes, I could name 20 potential sources of error. After toying around with an Internet search, this list grew to 40. Any one of these errors could have “infinite” consequences to the accuracy of a poll or research project. Or, they might not matter at all.

I thought I would organize errors and biases into a “top 5.” These are based on about 25 years’ experience in the research and polling industry and seem to be the types of errors and biases we see the most often and are most consequential.

The Top 5

1.  Researcher Bias.

The most important error that creeps into surveys about isn’t statistical at all and is not measurable. The viewpoint of the researcher has a way of creeping into question design and analysis. Some times this is purposeful, and other times it is more subtle. All research designers are human, and have points-of-view. Even the most practiced and professional researchers can have subtle biases in the way they word questions or interpret results. How we frame questions and report results is always affected by our experiences and viewpoints – which can be a good thing, but can also affect the purity of the study.

2. Poor match of the sample to the population.

This is the source of some of the most famous errors in polling. Our industry once predicted the elections of future Presidents Alf Landon and Thomas Dewey based on this mistake. It is almost never the case that the sampling frame you use is a perfect match to the population you are trying to understand, so this error is present on most studies. You can sometimes recover from asking the wrong questions, but you can never recover from asking them of the wrong people

Most clients (and suppliers) like to focus on questionnaire development when a new project is awarded. The reality is the sampling and weighting plan is every bit as consequential to the success of the project, and rarely gets the attention it deserves. We can tell when we have a client that really knows what they are doing if they begin the project by focusing on sampling issues and not jumping to questionnaire design.

3. Lack of randomness/response bias.

Many surveys proceed without random samples. In fact, it is rare that a survey being done today can accurately claim to be using a random sample. Remember those statistics courses you took in college and graduate school? The one thing they have in common is pretty much everything they taught you statistically is only relevant if you have a random sample. And, odds are great that you don’t.

A big source of “non-randomness” in a sample is response bias. A typical RDD phone survey being conducted today has a cooperation rate of less than 20%. 10% is considered a good response rate from an online panel. When we report results of these studies, we are assuming that the vast majority of people who didn’t respond would have responded in the same way as those who did. Often, this is a reasonable assumption. But, sometimes it is not. Response bias is routinely ignored in market research and polls because it is expensive to correct (the fix involves surveying the non-responders).

4.  Failure to quota sample or weight data.

This is a bit technical. Even if we sample randomly, it is typical for some subgroups to be more willing to cooperate than others. For example, females are typically less likely to refuse a survey invitation than males, and minorities are less likely to participate than whites. So, a good researcher will quota sample and weight data to compensate for this. In short, if you know something about your population before you survey them, you should use this knowledge to your advantage. If you are conducting an online poll and you are not doing something to quota sample or weight the data, odds are very good that you are making an important mistake.

5.  Overdoing it.

I have worked with methodologists who have more degrees than a thermometer, think about the world in Greek letters, and understand every type of bias we can comprehend. I have also seen them concentrate so much on correcting for every type of error they can imagine that they “overcook” the data. I remember once passing off a data set to a statistician, who corrected for 10 types of errors, and the resulting data set didn’t even have the gender distribution it the proper proportion.

Remember — you don’t have to correct for an error or bias unless it has an effect on what you are asking.  For example, if men and women answer a question identically, weighting by gender will have no effect on the study results. Instead, you should know enough about the issues you are studying to know what types of errors are likely to be relevant to your study.

So that is our top 5. Note that I did not put sampling error in the top 5. I am not sure it would make my top 20. Sampling error is the “+/- 5%” that you see attached to many polls. We will do a subsequent blog post on why this isn’t a particularly relevant error for most studies. It just happens to be the one type of error that can be easily calculated mathematically, which is why we see it cited so often. I am more concerned about the errors that are harder to calculate, or, more importantly, the ones that go unnoticed.

With 40+ sources of errors, one could wonder how our industry ever gets it right. Yet we do. More than $10 Billion is spent on research and polling in the US each year, and if this money was not being spent effectively, the industry would implode. So, how do we get it right?

In one sense, many of the errors in surveys tend to be randomly distributed. For instance, there can be a fatigue bias in a question involving a long list of items to be assessed. By presenting long lists in a randomized order we can “randomize” this error – we don’t remove it.

In some sense, errors and biases also seem to have a tendency to cancel each other out, rather than magnify each other. And, as stated above, not all errors matter to every project. The key is to consider which ones might before the study is fielded.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.