Posts Tagged 'Methodology'

Your grid questions probably aren’t working

Convincing people to participate in surveys and polls has become so challenging that more attention is going toward preventing them from suspending once they choose to respond.

Most survey suspends occur in one of two places. The first is at the initial screen the respondent sees. Respondents click through an invitation, and many quickly decide that the survey isn’t for them and abandon the effort.

The second most common place is the first grid question respondents encounter. They see an imposing grid question and decide it isn’t worth their time to continue. It doesn’t matter where this question is placed – this happens whether the first grid question is early in the questionnaire, in the middle, or toward the end.

Respondents hate answering grid questions. Yet clients continue to ask them, and survey researchers include them without much thought. The quality of data they yield tends to be low.

A measurement error issue with grid questions is known as “response set bias.” When we present a list of, say, ten items, we want to get a respondent to make an independent judgment of each, unrelated to what they think of the others. But, with a long list of items, that is not what happens. Instead, when people respond to later questions, they remember what they said earlier. If I indicated that feature A in a list was “somewhat important” to me when I assess feature B, it is natural to think about how it compares in importance to feature A. This introduces unwanted correlations into the data set.

Instead, we want a respondent to assess feature A, clear their mind entirely, and then assess feature B. That is a challenging task, but placing features on a long, intimidating list, makes it near impossible. Some researchers think we eliminate this error by randomizing the list order, but all that does is spread the error out. It is important to randomize the options so this error doesn’t concentrate on just a few items, but randomization does not solve the problem.

Errors you have probably heard of lurk in long grid questions. Things like fatigue biases (respondents attend less to the items late in the list), question order biases, priming effects, recency biases, etc. In short, grid questions are just asking for many measurement errors, and we end up crossing our fingers and hoping some of these cancel each other out.

This is admittedly a mundane topic, but it is the one questionnaire design issue I have the most difficulty convincing clients to do something about. Grid questions capture a lot of data in a short amount of questionnaire time, so they are enticing for clients.

I prefer a world where we seldom ask them. If we need to, we recommend maybe one or two per questionnaire and never more than 4 to 6 items in them. I rarely succeed in convincing clients of this.

“Textbook” explanations of problems with grid questions do not include the issue that bothers me most. What happens in grid questions is the question respondents hear and respond to is often not the literal question that is composed.

Consider a grid question like this, with a 5-point importance scale as the response options:

Q: How important were the following when you decided to buy the widget?

  1. The widget brand cares about sustainability
  2. The price of the widget
  3. The color of the widget is attractive to you
  4. The widget will last a long time

Think about the first item (“The widget brand cares about sustainability”). The client wants to understand how important sustainability is in the buying decision. How important of a buying criterion is sustainability?

But that is likely not what the respondent “hears” in the question. The respondent will probably see the question as asking if they care about sustainability and who doesn’t? So, what would tend to happen is sustainability would be overstated as a decision driver when analyzing the data set. Respondents don’t leap to thinking about sustainability as a buying consideration; instead, they respond about sustainability in general.

Clients and suppliers must realize that respondents do not parse our words as we would like them to, and they do not always attend to our questions. We need to anticipate this.

How do we fix this issue?  We should be more straightforward in how we ask questions. In this example, I would prefer to derive the importance of sustainability in the buying decision. I’d include a question asking how much they care about sustainability (and be careful to phrase it so it can have a response across various answer choices).  Then, in a second question, I would gather a dependent variable asking how likely they are to buy the widget in the future.

A regression or correlation analysis would provide coefficients across variables that indicate their relative importance. Yes, it would be based on correlations and not necessarily causation. In reality, research studies rarely set up the experiments necessary to give evidence of causation, and we should not get too hung up on that.

I would conclude that sustainability is an essential feature if it popped in the regression as having a high coefficient and if I saw something else in other questions or open-ends that indicated sustainability mattered from another angle. Always look for another data point or another data source that supports your conclusion.

Grid questions are the most over-rated and overused types of survey questions. Clients like them, but they tend to provide poor-quality data. Use them sparingly and look for alternatives.

Pre-Election Polling and Baseball Share a Lot in Common

The goal of a pre-election poll is to predict which candidate will win an election and by how much. Pollsters work towards this goal by 1) obtaining a representative sample of respondents, 2) determining which candidate a respondent will vote for, and 3) predicting the chances each respondent will take the time to vote.

All three of these steps involve error. It is the first one, obtaining a representative sample of respondents, which has changed the most in the past decade or so.

It is the third characteristic that separates pre-election polling from other forms of polling and survey research. Statisticians must predict how likely each person they interview will be to vote. This is called their “Likely Voter Model.”

As I state in POLL-ARIZED, this is perhaps the most subjective part of the polling process. The biggest irony in polling is that it becomes an art when we hand the data to the scientists (methodologists) to apply a Likely Voter Model.

It is challenging to understand what pollsters do in their Likely Voter Models and perhaps even more challenging to explain.  

An example from baseball might provide a sense of what pollsters are trying to do with these models.

Suppose Mike Trout (arguably the most underappreciated sports megastar in history) is stepping up to the plate. Your job is to predict Trout’s chances of getting a hit. What is your best guess?

You could take a random guess between 0 and 100%. But, since that would give you a 1% chance of being correct, there must be a better way.

A helpful approach comes from a subset of statistical theory called Bayesian statistics. This theory says we can start with a baseline of Trout’s hit probability based on past data.

For instance, we might see that so far this year, the overall major league batting average is .242. So, we might guess that Trout’s probability of getting a hit is 24%.

This is better than a random guess. But, we can do better, as Mike Trout is no ordinary hitter.

We might notice there is even better information out there. Year-to-date, Trout is batting .291. So, our guess for his chances might be 29%. Even better.

Or, we might see that Trout’s lifetime average is .301 and that he hit .333 last year. Since we believe in a concept called regression to the mean, that would lead us to think that his batting average should be better for the rest of the season than it is currently. So, we revise our estimate upward to 31%.

There is still more information we can use. The opposing pitcher is Justin Verlander. Verlander is a rare pitcher who has owned Trout in the past – Trout’s average is just .116 against Verlander. This causes us to revise our estimate downward a bit. Perhaps we take it to about 25%.

We can find even more information. The bases are loaded. Trout is a clutch hitter, and his career average with men on base is about 10 points higher than when the bases are empty. So, we move our estimate back up to about 28%.

But it is August. Trout has a history of batting well early in and late in the season, but he tends to cool off during the dog days of summer. So, we decide to end this and settle on a probability of 25%.

This sort of analysis could go on forever. Every bit of information we gather about Trout can conceivably help make a better prediction for his chances. Is it raining? What is the score? What did he have for breakfast? Is he in his home ballpark? Did he shave this morning? How has Verlander pitched so far in this game? What is his pitch count?

There are pre-election polling analogies in this baseball example, particularly if you follow the probabilistic election models created by organizations like FiveThirtyEight and The Economist.

Just as we might use Trout’s lifetime average as our “prior” probability, these models will start with macro variables for their election predictions. They will look at the past implications of things like incumbency, approval ratings, past turnout, and economic indicators like inflation, unemployment, etc. In theory, these can adjust our assumptions of who will win the election before we even include polling data.

Of course, using Trout’s lifetime average or these macro variables in polling will only be helpful to the extent that the future behaves like the past. And therein lies the rub – overreliance on past experience makes these models inaccurate during dynamic times.

Part of why pollsters missed badly in 2020 is unique things were going on – a global pandemic, changed methods of voting, increased turnout, etc. In baseball, perhaps this is a year with a juiced baseball, or Trout is dealing with an injury.

The point is that while unprecedented things are unpredictable, they happen with predictable regularity. There is always something unique about an election cycle or a Mike Trout at bat.

The most common question I am getting from readers of POLL-ARIZED is, “will the pollsters get it right in 2024?” My answer is that since pollsters are applying past assumptions in their model, they will get it right to the extent that the world in 2024 looks like the world did in 2020, and I would not put my own money on it.

I make a point in POLL-ARIZED that pollsters’ models have become too complex. While in theory, the predictive value of a model never gets worse when you add in more variables, in practice, this has made these models uninterpretable. Pollsters include so many variables in their likely voter models that many of their adjustments cancel each other out. They are left with a model with no discernable underlying theory.

If you look closely, we started with a probability of 24% for Trout. Even after looking at a lot of other information and making reasonable adjustments, we still ended up with a prediction of 25%. The election models are the same way. They include so many variables that they can cancel out each other’s effects and end up with a prediction that looks much like the raw data did before the methodologists applied their wizardry.

This effort is better spent at getting better input for the models by investing in generating the trust needed to increase the response rates we get to our surveys and polls. Improving the quality of our data input will increase the predictive quality of the polls more than coming up with more complicated ways to weight the data.

Of course, in the end, one candidate wins, and the other loses, and Mike Trout either gets a hit, or he doesn’t, so the actual probability moves to 0% or 100%. Trout cannot get 25% of a hit, and a candidate cannot win 79% of an election.

As I write this, I looked up the last time Trout faced Verlander. It turns out Verlander struck him out!

Things That Surprised Me When Writing a Book

I recently published a book outlining the challenges election pollsters face and the implications of those challenges for survey researchers.

This book was improbable. I am not an author nor a pollster, yet I wrote a book on polling. It is a result of a curiosity that got away from me.

Because I am a new author, I thought it might be interesting to list unexpected things that happened along the way. I had a lot of surprises:

  • How quickly I wrote the first draft. Many authors toil for years on a manuscript. The bulk of POLL-ARIZED was composed in about three weeks, working a couple of hours daily. The book covers topics central to my career, and it was a matter of getting my thoughts typed and organized. I completed the entire first draft before telling my wife I had started it.
  • How long it took to turn that first draft into a final draft. After I had all my thoughts organized, I felt a need to review everything I could find on the topic. I read about 20 books on polling and dozens of academic papers, listened to many hours of podcasts, interviewed polling experts, and spent weeks researching online. I convinced a few fellow researchers to read the draft and incorporated their feedback. The result was a refinement of my initial draft and arguments and the inclusion of other material. This took almost a year!
  • How long it took to get the book from a final draft until it was published. I thought I was done at this point. Instead, it took another five months to get it in shape to publish – to select a title, get it edited, commission cover art, set it up on Amazon and other outlets, etc. I used Scribe Media, which was expensive, but this process would have taken me a year or more if I had done it without them.
  • That going for a long walk is the most productive writing tactic ever. Every good idea in the book came to me when I trekked in nature. Little of value came to me when sitting in front of a computer. I would go for long hikes, work out arguments in my head, and brew a strong cup of coffee. For some reason, ideas flowed from my caffeinated state of mind.
  • That writing a book is not a way to make money. I suspected this going in, but it became clear early on that this would be a money-losing project. POLL-ARIZED has exceeded my sales expectations, but it cost more to publish than it will ever make back in royalties. I suspect publishing this book will pay back in our research work, as it establishes credibility for us and may lead to some projects.
  • Marketing a book is as challenging as writing one. I guide large organizations on their marketing strategy, yet I found I didn’t have the first clue about how to promote this book. I would estimate that the top 10% of non-fiction books make up 90% of the sales, and the other 90% of books are fighting for the remaining 10%.
  • Because the commission on a book is a few dollars per copy, it proved challenging to find marketing tactics that pay back. For instance, I thought about doing sponsored ads on LinkedIn. It turns out that the per-click charge for those ads was more than the book’s list price. The best money I spent to promote the book was sponsored Amazon searches. But even those failed to break even.
  • Deciding to keep the book at a low price proved wise. So many people told me I was nuts to hold the eBook at 99 cents for so long or keep the paperback affordable. I did this because it was more important to me to get as many people to read it as possible than to generate revenue. Plus, a few college professors have been interested in adopting the book for their survey research courses. I have been studying the impact of book prices on college students for about 20 years, and I thought it was right not to contribute to the problem.
  • BookBub is incredible if you are lucky enough to be selected. BookBub is a community of voracious readers. I highly recommend joining if you read a lot. Once a week, they email their community about new releases they have vetted and like. They curate a handful of titles out of thousands of submissions. I was fortunate that my book got selected. Some authors angle for a BookBub deal for years and never get chosen. The sales volume for POLL-ARIZED went up by a factor of 10 in one day after the promotion ran.
  • Most conferences and some podcasts are “pay to play.” Not all of them, but many conferences and podcasts will not support you unless you agree to a sponsorship deal. When you see a research supplier speaking at an event or hear them on a podcast, they may have paid the hosts something for the privilege. This bothers me. I understand why they do this, as they need financial support. Yet, I find it disingenuous that they do not disclose this – it is on the edge of being unethical. It harms their product. If a guest has to pay to give a conference presentation or talk on a podcast, it pressures them to promote their business rather than have an honest discussion of the issues. I will never view these events or podcasts the same. (If you see me at an event or hear me on a podcast, be assured that I did not pay anything to do so.)
  • That the industry associations didn’t want to give the book attention. If you have read POLL-ARIZED, you will know that it is critical (I believe appropriately and constructively) of the polling and survey research fields. The three most important associations rejected my proposals to present and discuss the book at their events. This floored me, as I cannot think of any topics more essential to this industry’s future than those I raise in the book. Even insights professionals who have read the book and disagree with my arguments have told me that I am bringing up points that merit discussion. This cold shoulder from the associations made me feel better about writing that “this is an industry that doesn’t seem poised to fix itself.”
  • That clients have loved the book. The most heartwarming part of the process is that it has reconnected me with former colleagues and clients from a long research career. Everyone I have spoken to who is on the client-side of the survey research field has appreciated the book. Many clients have bought it for their entire staff. I have had client-side research directors I have never worked with tell me they loved the book.
  • That some of my fellow suppliers want to kill me. The book lays our industry bare, and not everyone is happy about that. I had a competitor ask me, ” Why are you telling clients to ask us what our response rates are?” I stand behind that!
  • How much I learned along the way. There is something about getting your thoughts on paper that creates a lot of learning. There is a saying that the best way to learn a subject is to teach it. I would add that trying to write a book about something can teach you what you don’t know. That was a thrill for me. But then again, I was the type of person who would attend lectures for classes I wasn’t even taking while in college. I started writing this book to educate myself, and it has been a great success in that sense.
  • How tough it was for me to decide to publish it. There was not a single point in the process when I did not consider not publishing this book. I found I wanted to write it a lot more than publish it. I suffered from typical author fears that it wouldn’t be good enough, that my peers would find my arguments weak, or that it would bring unwanted attention to me rather than the issues the book presents. I don’t regret publishing it, but it would never have happened without encouragement from the few people who read it in advance.
  • The respect I gained for non-fiction authors. I have always been a big reader. I now realize how much work goes into this process, with no guarantee of success. I have always told people that long-form journalism is the profession I respect the most. Add “non-fiction” writers to that now!

Almost everyone who has contacted me about the book has asked me if I will write another one. If I do, it will likely be on a different topic. If I learned anything, this process requires selecting an issue you care about passionately. Journalists are people who can write good books about almost anything. The rest of us mortals must choose a topic we are super interested in, or our books will be awful.

I’ve got a few dancing around in my head, so who knows, maybe you’ll see another book in the future.

For now, it is time to get back to concentrating on our research business!

The Insight that Insights Technology is Missing

The market research insights industry has long been characterized by a resistance to change. This likely results from the academic nature of what we do. We don’t like to adopt new ways of doing things until they have been proven and studied.

I would posit that the insights industry has not seen much change since the transition from telephone to online research occurred in the early 2000s. And even that transition created discord within the industry, with many traditional firms resistant to moving on from telephone studies because online data collection had not been thoroughly studied and vetted.

In the past few years, the insights industry has seen an influx of capital, mostly from private equity and venture capital firms. The conditions for this cash infusion have been ripe: a strong and growing demand for insights, a conservative industry that is slow to adapt, and new technologies arising that automate many parts of a research project have all come together simultaneously.

Investing organizations see this enormous business opportunity. Research revenues are growing, and new technologies are lowering costs and shortening project timeframes. It is a combustible business situation that needs a capital accelerant.

Old school researchers, such as myself, are becoming nervous. We worry that automation will harm our businesses and that the trend toward DIY projects will result in poor-quality studies. Technology is threatening the business models under which we operate.

The trends toward investment in automation in the insights industry are clear. Insights professionals need to embrace this and not fight it.

However, although the movement toward automation will result in faster and cheaper studies, this investment ignores the threats that declining data quality creates. In the long run, this automation will accelerate the decline in data quality rather than improve it.

It is great that we are finding ways to automate time-consuming research tasks, such as questionnaire authoring, sampling, weighting, and reporting. This frees up researchers to concentrate on drawing insights out of the data. But, we can apply all the automation in the world to the process, yet if we do not do something about data quality, it will not increase the value clients receive.

I argue in POLL-ARIZED that the elephant in the research room is the fact that very few people want to take our surveys anymore. When I began in this industry, I routinely fielded telephone projects with 70-80% response rates. Currently, telephone and online response rates are between 3-4% for most projects.

Response rates are not everything. You can make a compelling argument that they do not matter at all. There is no problem as long as the 3-4% response we get is representative. I would rather have a representative 3% answer a study than a biased 50%.

But, the fundamental problem is that this 3-4% is not representative. Only about 10% of the US population is currently willing to take surveys. What is happening is that this same 10% is being surveyed repeatedly. In the most recent project Crux fielded, respondents had taken an average of 8 surveys in the past two weeks. So, we have about 10% of the population taking surveys every other day, and our challenge is to make them represent the rest of the population.

Automate all you want, but the data that are the backbone of the insights we are producing quickly and cheaply is of historically low quality.

The new investment flooding into research technology will contribute to this problem. More studies will be done that are poorly designed, with long, tortuous questionnaires. Many more surveys will be conducted, fewer people will be willing to take them, and response rates will continue to fall.

There are plenty of methodologists working on these problems. But, for the most part, they are working on new ways to weight the data we can obtain rather than on ways to compel more response. They are improving data quality, but only slightly, and the insights field continues to ignore the most fundamental problem we have: people do not want to take our surveys.

For the long-term health of our field, that is where the investment should go.

In POLL-ARIZED, I list ten potential solutions to this problem. I am not optimistic that any of them will be able to stem the trend toward poor data quality. But, I am continually frustrated that our industry has not come together to work towards expanding respondent trust and the base of people willing to take part in our projects.

The trend towards research technology and automation is inevitable. It will be profitable. But, unless we address data quality issues, it will ultimately hasten the decline of this field.

POLL-ARIZED available on May 10

I’m excited to announce that my book, POLL-ARIZED, will be available on May 10.
 
After the last two presidential elections, I was fearful my clients would ask a question I didn’t know how to answer: “If pollsters can’t predict something as simple as an election, why should I believe my market research surveys are accurate?”
 
POLL-ARIZED results from a year-long rabbit hole that question led me down! In the process, I learned a lot about why polls matter, how today’s pollsters are struggling, and what the insights industry should do to improve data quality.
 
I am looking for a few more people to read an advance copy of the book and write an Amazon review on May 10. If you are interested, please send me a message at poll-arized@cruxresearch.com.

Questions You Are Not Asking Your Market Research Supplier That You Should Be Asking

It is no secret that providing representative samples for market research projects has become challenging. While clients are always focused on obtaining respondents quickly and efficiently, it is also important that they are concerned with the quality of their data. The reality is that quality is slipping.

While there are many causes of this, one that is not discussed much is that clients rarely ask their suppliers the tough questions they should. Clients are not putting pressure on suppliers to focus on data quality. Since clients ultimately control the purse strings of projects, suppliers will only improve quality if clients demand it.

I can often tell if I have an astute client by their questions when we are designing studies. Newer or inexperienced clients tend to start by talking about the questionnaire topics. Experienced clients tend to start by talking about the sample and its representativeness.

Below is a list of a few questions that I believe clients should be asking their suppliers on every study. The answers to these are not always easy to come by, but as a client, you want to see that your supplier has contemplated these questions and pays close attention to the issues they highlight.

For each, I have also provided a correct or acceptable answer to expect from your supplier.

  • What was the response rate to my study? While it was once commonplace to report response rates, suppliers try to dodge this issue. Most data quality issues stem from low response rates. Correct Answer: For most studies, under 5%. Unless the survey is being fielded among a highly engaged audience, such as your customers, you should be suspicious of any answer over 15%. “I don’t know” is an unacceptable answer. Suppliers will also try to convince you that response rates do not matter when every data quality issue we experience stems from inadequate response to our surveys.
  • How many respondents did you remove in fielding for quality issues? This is an emerging issue. The number of bad-quality respondents in studies has grown substantially in just the last few years. Correct answer: at least 10%, but preferably between 25% and 40%. If your supplier says 0%, you should question whether they are properly paying attention to data quality issues. I would guide you to find a different supplier if they cannot describe a process to remove poor-quality respondents. There is no standard way of doing this, but each supplier should have an established process.
  • How were my respondents sourced? This is an essential question seldom asked unless our client is an academic researcher. It is a tricky question to answer. Correct answer: This is so complicated that I have difficulty providing a cogent response to our clients. Here, the hope is that your supplier has at least some clue as to how the panel companies get their respondents and know who to go to if a detailed explanation is needed. They should connect you with someone who can explain this in detail.
  • What are you doing to protect against bots? Market research samples are subject to the ugly things that happen online – hackers, bots, cheaters, etc. Correct answer: Something proactive. They might respond that they are working with the panel companies to prevent bots or a third-party firm to address this. If they are not doing anything or don’t seem to know that bots are a big issue for surveys, you should be concerned.
  • What is in place to ensure that my respondents are not being used for competitors or vice-versa? Often, clients should care that the people answering their surveys have not done another project in your product category recently. I have had cases where two suppliers working for the same client (one being us) used the same sample source and polluted the sample base for both projects because we did not know the other study was fielding. Correct answer: Something if this is important to you. If your research covers brand or advertising awareness, you should account for this. If you are commissioning work with several suppliers, this takes considerable coordination.
  • Did you run simulated data through my survey before fielding? This is an essential, behind-the-scenes step that all suppliers that know what they are doing take. Running thousands of simulated surveys through the questionnaire tests survey logic and ensures that the right people get to the right questions. While it doesn’t prevent all errors, it catches many of them. Correct answer: Yes. If the supplier does not know what simulated data is, it is time to consider a new supplier.
  • How many days will my study be in the field? Many errors in data quality stem from conducting studies too quickly. Correct answer: Varies, but this should be 10-21 days for a typical project. If your study better have difficult-to-find respondents, this could be 3-4 weeks. If the data collection period is shorter than ten days, you WILL have data quality errors that arise, so be sure you understand the tradeoffs for speed. Don’t insist on field speed unless you need to.
  • Can I have a copy of the panel company’s answers to the ESOMAR questions? ESOMAR has put out a list of questions to help buyers of online samples. Every sample supplier worth using will have created a document that answers these questions. Correct answer: Yes. Do not work with a company that has not put together a document answering these questions, as all the good ones have. However, after reading this document, don’t expect to understand how your respondents are being sourced.
  • How do you handle requests down the road when the study is over? It is a longstanding pet peeve of most clients that suppliers charge for basic customer support after the project is over. Make sure you have set expectations properly upfront and put these expectations into the contract. Correct answer: Forever. Our company only charges if support requests become substantial. Many suppliers will provide support for three- or six months post-study and will charge for this support. I have never understood this, as I am flattered when a client calls me to discuss a study that was done years ago, as this means our study is continuing to make an impact. We do not charge for this follow-up unless the request requires so much time that we have to.

There are probably many other questions clients should be asking suppliers. Clients need to get tougher on insisting on data quality. It is slipping, and suppliers are not investing enough to improve response rates and develop trust with respondents. If clients pressure them, the economic incentives will be there to create better techniques to obtain quality research data.

Less useful research questions

Questionnaire “real estate” is limited and valuable. Most surveys fielded today are too long and this causes problems with respondent fatigue and trust. Researchers tend to start the questionnaire design process with good intent and aim to keep survey experiences short and compelling for respondents. However, it is rare to see a questionnaire get shorter as it undergoes revision and review, and many times the result is impossibly long surveys.

One way to guard against this is to be mindful. All questions included should have a clear purpose and tie back to study objectives. Many times, researchers include some questions and options simply out of habit, and not because these questions will add value to the project.

Below are examples of question types that, more often or not, add little to most questionnaires. These questions are common and used out of habit. There are certainly exceptions when it makes sense to include these questions, but for the most part we advise against using them unless there is a specific reason to include them.

Marital status

Somewhere along the way, asking a respondent’s marital status became standard on most consumer questionnaires. Across thousands of studies, I can only recall a few times when I have actually used it for anything. It is appropriate to ask if it is relevant. Perhaps your client is a jewelry company or in the bridal industry. Or, maybe you are studying relationships. However, I would nominate marital status as being the least used question in survey research history.

Other (specify)

Many multiple response questions ask a respondent to select all that apply from a list, and then as a final option will have “other.” Clients constantly pressure researchers to leave a space for respondents to type out what this “other” option is. We rarely look at what they type in. I tell clients that if we expect a lot of respondents to select the other option, it probably means that we have not done a good job at developing the list. It may also mean that we should be asking the question in an open-ended fashion. Even when it is included, most of the respondents who select other will not type anything into the little box anyway.

Don’t Know Options

We recently composed an entire post about when to include a Don’t Know option on a question. To sum it up, the incoming assumption should be that you will not use a Don’t Know option unless you have an explicit reason to do so. Including Don’t Know as an option can make a data set hard to analyze. However, there are exceptions to this rule, as Don’t Know can be an appropriate choice. That said, it is overused on surveys currently.

Open-Ends

The transition from telephone to online research has completely changed how researchers can ask open-ended questions. In the telephone days, we could pose questions that were very open-ended because we had trained interviewers who could probe for meaningful answers. With online surveys, open-ended questions that are too loose rarely produce useful information. Open-ends need to be specific and targeted. We favor the inclusion of just a handful of open-ends in each survey, and that they are a bit less “open-ended” than what has been traditionally asked.

Grid questions with long lists

We have all seen these. These are long lists of items that require a scaled response, perhaps a 5-point agree/disagree scale. The most common abandon point on a survey is the first time a respondent encounters a grid question with a long list. Ideally, these lists are about 4 to 6 items and there are no more than two or three of them on a questionnaire.

We currently field a study that has a list like this with 28 items in it. There is no way we are getting good information from this question and we are fatiguing the respondent for the remainder of the survey.

Specifying time frames

Survey research often seeks to find out about a behavior across a specified time frame. For instance, we might want to know if a consumer has used a product in the past day, past week, past month, etc. The issue here is not so much the time frame, it is when we consider the responses to be literal. I have seen clients take past day usage and multiply it by 365 and assume that will equate to past year usage. Technically and mathematically, that might be true, but it isn’t how respondents react to questions.

In reality, it is likely accurate to ask if a respondent has done something in the past day. But, once the time frames get longer, we are really asking about “ever” usage. It depends a bit on the purchase cycle of the product and its cost, but for most products, asking if they have used in the past month, 6 months, year, etc. will yield similar responses.

Some researchers work around this by just asking “ever used” and “recently used.” There are times when that works, but we tend to set a reasonable time frame for recent use and go with that, typically within the past week.

Household income

Researchers have asked household income as long as the survey research field has been around. There are at least three serious problems with it. First, many respondents are not knowledgeable about what their household income is. Most households have a “family CFO” who takes the lead on financial issues, and even this person often will not know what the family income is. 

Second, the categories chosen affect the response to the income question, indicating just how unstable it is. Asking household income in say, ten categories versus five categories will not result in comparable data. Respondents tend to assume the middle of the range given is normal, and respond using that as a reference point.

Third, and most importantly, household income is a lousy measure of socio-economic status (SES). Many young people have low annual incomes but a wealthy lifestyle as they are still being supported by their parents. Many older people are retired and may have almost non-existent incomes, yet live a wealthy lifestyle off of their savings. Household income tends to only be a reasonable measure of SES for respondents aged about 30 to 60,

There are better measures of SES. Education level can work, and a particularly good question is to ask the respondent about their mother’s level of education, which has been shown to correlate strongly with SES. We also ask about their attitudes towards their income – whether they have all the money they need, just enough, or if they struggle to meet basic expenses.

Attention spans are getting shorter and as more and more surveys are being completed on mobile devices there are plenty of distractions as respondents answer questionnaires. Engage them, get their attention, and keep the questionnaire short. There may be no such thing as a dumb question, but there are certainly questions that when asked on a survey do not yield useful information.

Should you include a “Don’t Know” option on your survey question?

Questionnaire writers construct a bridge between client objectives and a line of questioning that a respondent can understand. This is an underappreciated skill.

The best questionnaire writers empathize with respondents and think deeply about tasks respondents are asked to perform. We want to strike a balance between the level of cognitive effort required and a need to efficiently gather large amounts of data. If the cognitive effort required is too low, the data captured is not of high quality. If it is too high, respondents get fatigued and stop attending to our questions.

One of the most common decisions researchers have to make is whether or not to allow for a Don’t Know (DK) option on a question. This is often a difficult choice, and the correct answer on whether to include a DK option might be the worst possible answer: “It depends.”

Researchers have genuine disagreements about the value of a DK option. I lean strongly towards not using DK’s unless there is a clear and considered reason for doing so.

Clients pay us to get answers from respondents and to find out what they know, not what they don’t know. Pragmatically, whenever you are considering adding a DK option your first inclination should be that you perhaps have not designed the question well. If a large proportion of your respondent base will potentially choose “don’t know,” odds are high that you are not asking a good question to begin with, but there are exceptions.

If you get in a situation where you are not sure if you should include a DK option, the right thing to do is to think broadly and reconsider your goal: why are you asking the question in the first place? Here is an example which shows how the DK decision can actually be more complicated than it first appears.

We recently had a client that wanted us to ask a question similar to this: “Think about the last soft drink you consumed. Did this soft drink have any artificial ingredients?”

Our quandary was whether we should just ask this as a Yes/No question or to also give the respondent a DK option. There was some discussion back and forth, as we initially favored not including DK, but our client wanted it.

Then it dawned on us that whether or not to include DK depended on what the client wanted to get out of the question. On one hand, the client might want to truly understand if the last soft drink consumed had any artificial ingredients in it, which is ostensibly what the question asks. If this was the goal, we felt it was necessary to better educate the respondent on what an “artificial ingredient” was so they could provide an informed answer and so all respondents would be working from a common definition. Or, alternatively, we could ask for the exact brand and type of soft drink they consumed and then on the back-end code which ones have artificial ingredients and which do not, and thus get a good estimate for the client.

The other option was to realize that respondents might have their own definitions of “artificial ingredients” that may or may not match our client’s definition. Or, they may have no clue what is artificial and what is not.

In the end, we decided to use the DK option in this case because understanding how many people are ignorant to artificial ingredients fit well with our objectives. When we pressed the client, we learned that they wanted to document this ambiguity. If a third of consumers don’t know whether or not their soft drinks have artificial ingredients in them, this would be useful information for our client to know.

This is a good example on how a seemingly simple question can have a lot of thinking behind it and how it is important to contextualize this reasoning when reporting results. In this case, we are not really measuring whether people are drinking soft drinks with artificial ingredients. We are measuring what they think they are doing, which is not the same thing and likely more relevant from a marketing point-of-view.

There are other times when a DK option makes sense to include. For instance, some researchers will conflate the lack of an option (a DK response) with a neutral opinion and these are not the same thing. For example, we could be asking “how would you rate the job Joe Biden is doing as President?” Someone who answers in the middle of the response scale likely has a considered, neutral opinion of Joe Biden. Someone answering DK has not considered the issue and should not be assumed to have a neutral opinion of the president. This is another case where it might make sense to use DK.

However, there are probably more times when including a DK option is a result of lazy questionnaire design than any deep thought regarding objectives. In practice, I have found that it tends to be clients who are inexperienced in market research that press hardest to include DK options.

There are at least a couple of serious problems with including DK options on questionnaires. The first is “satisficing” – which is a tendency respondents have to not place a lot of effort on responding and instead choose the option that requires the least cognitive effort. The DK option encourages satisficing. A DK option also allows respondents to disengage with the survey and can lead to inattention on subsequent items.

DK responses create difficulties when analyzing data. We like to look at questions on a common base of respondents, and that becomes hard to comprehend when respondents choose DK on some questions but not others. Including DK makes it harder to compare results across questions. DK options also limit the ability to use multivariate statistics, as a DK response does not fit neatly on a scale.

Critics would say that researchers should not force respondents to express and opinion they do not have and therefore should provide DK options. I would counter by saying that if you expect a substantial amount of people to not have an opinion, odds are high you should reframe the question and ask them about something they do know about. It is usually (but not always) the case that we want to find out more about what people know than what they don’t know.

“Don’t know” can be a plausible response. But, more often than not, even when it is a plausible response if we feel a lot of people will choose it, we should reconsider why we are asking the question. Yes, we don’t want to force people to express an option they don’t have. But rather than include DK, it is better to rewrite a question to be more inclusive of everybody.

As an extreme example, here is a scenario that shows how a DK can be designed out of a question:

We might start with a question the client provides us: “How many minutes does your child spend doing homework on a typical night?” For this question, it wouldn’t take much pretesting to realize that many parents don’t really know the answer to this, so our initial reaction might be to include a DK option. If we don’t, parents may give an uninformed answer.

However, upon further thought, we should realize that we may not really care about how many minutes the child spends on homework and we don’t really need to know whether the parent knows this precisely or not. Thinking even deeper, some kids are much more efficient in their homework time than others, so measuring quantity isn’t really what we want at all. What we really want to know is, is the child’s homework level appropriate and effective from the parent’s perspective?

This probing may lead us down a road to consider better questions, such as “in your opinion, does your child have too much, too little, or about the right amount of homework?” or “does the time your child spends on homework help enhance his/her understanding of the material?” This is another case when thinking more about why we are asking the question tends to result in better questions being posed.

This sort of scenario happens a lot when we start out thinking we want to ask about a behavior, when what we really want to do is ask about an attitude.

The academic research on this topic is fairly inconclusive and sometimes contradictory. I think that is because academic researchers don’t consider the most basic question, which is whether or not including DK will better serve the client’s needs. There are times that understanding that respondents don’t know is useful. But, in my experience, more often than not if a lot of respondents choose DK it means that the question wasn’t designed well. 


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.