Posts Tagged 'Methodology'

Researchers should be mindful of “regression toward the mean”

There is a concept in statistics known as regression toward the mean that is important for researchers to consider as we look at how the COVID-19 pandemic might change future consumer behavior. This concept is as challenging to understand as it is interesting.

Regression toward the mean implies that an extreme example in a data set tends to be followed by an example that is less extreme and closer to the “average” value of the population. A common example is if two parents that are above average in height have a child, that child is demonstrably more likely to be closer to average height than the “extreme” height of their parents.

This is an important concept to keep in mind in the design of experiments and when analyzing market research data. I did a study once where we interviewed the “best” customers of a quick service restaurant, defined as those that had visited the restaurant 10 or more times in the past month. We gave each of them a coupon and interviewed them a month later to determine the effect of the coupon. We found that they actually went to the restaurant less often the month after receiving the coupon than the month before.

It would have been easy to conclude that the coupon caused customers to visit less frequently and that there was something wrong with it (which is what we initially thought). What really happened was a regression toward the mean. Surveying customers who had visited a large number of times in one month made it likely that these same customers would visit a more “average” amount in a following month whether they had a coupon or not. This was a poor research design because we couldn’t really assess the impact of the coupon which was our goal.

Personally, I’ve always had a hard time understanding and explaining regression toward the mean because the concept seems to be counter to another concept known as “independent trials”. You have a 50% chance of flipping a fair coin and having it come up heads regardless of what has happened in previous flips. You can’t guess whether the roulette wheel will come up red or black based on what has happened in previous spins. So, why would we expect a restaurant’s best customers to visit less in the future?

This happens when we begin with a skewed population. The most frequent customers are not “average” and have room to regress toward the mean in the future. Had we surveyed all customers across the full range of patronage there would be no mean to regress to and we could have done a better job of isolating the effect of the coupon.

Here is another example of regression toward the mean. Suppose the Buffalo Bills quarterback, Josh Allen, has a monster game when they play the New England Patriots. Allen, who has been averaging about 220 yards passing per game in his career goes off and burns the Patriots for 450 yards. After we are done celebrating and breaking tables in western NY, what would be our best prediction for the yards Allen will throw for the second time the Bills play the Patriots?

Well, you could say the best prediction is 450 yards as that is what he did the first time. But, regression toward the mean would imply that he’s more likely to throw close to his historic average of 220 yards the second time around. So, when he throws for 220 yards the second game it is important to not give undue credit to Bill Belichick for figuring out how to stop Allen.

Here is another sports example. I have played (poorly) in a fantasy baseball league for almost 30 years. In 2004, Derek Jeter entered the season as a career .317 hitter. After the first 100 games or so he was hitting under .200. The person in my league that owned him was frustrated so I traded for him. Jeter went on to hit well over .300 the rest of the season. This was predictable because there wasn’t any underlying reason (like injury) for his slump. His underlying average was much better than his current performance and because of the concept of regression toward the mean it was likely he would have a great second half of the season, which he did.

There are interesting HR examples of regression toward the mean. Say you have an employee that does a stellar job on an assignment – over and above what she normally does. You praise her and give her a bonus. Then, you notice that on the next assignment she doesn’t perform on the same level. It would be easy to conclude that the praise and bonus caused the poor performance when in reality her performance was just regressing back toward the mean. I know sales managers who have had this exact problem – they reward their highest performers with elaborate bonuses and trips and then notice that the following year they don’t perform as well. They then conclude that their incentives aren’t working.

The concept is hard at work in other settings. Mutual funds that outperform the market tend to fall back in line the next year. You tend to feel better the day after you go to the doctor. Companies profiled in “Good to Great” tend to have hard times later on.

Regression toward the mean is important to consider when designing sampling plans. If you are sampling an extreme portion of a population it can be a relevant consideration. Sample size is also important. When you have just a few cases of something, mathematically an extreme response can skew your mean.

The issue to be wary of is that when we fail to consider regression toward the mean, we tend to overstate the importance of correlation between two things. We think our mutual fund manager is a genius when he just got lucky, that our coupon isn’t working, or that Josh Allen is becoming the next Drew Brees. All of these could be true, but be careful in how you interpret data that result from extreme or small sample sizes.

How does this relate to COVID? Well, at the moment, I’d say we are still in an “inflated expectations” portion of a hype curve when we think of what permanent changes may take place resulting from the pandemic. There are a lot of examples. We hear that commercial real estate is dead because businesses will keep employees working from home. Higher education will move entirely online. In-person qualitative market research will never happen again. Business travel is gone forever. We will never again work in an office setting. Shaking hands is a thing of the past.

I’m not saying there won’t be a new normal that results from COVID, but if we believe in regression toward the mean and the hype curve we’d predict that the future will look more like the past than how it is currently being portrayed. The post-COVID world will certainly look more like the past than a more extreme version of the present. We will naturally regress back toward the past and not to a more extreme version of current behaviors. The “mean” being regressed to has likely changed, but not as much as the current, extreme situation implies.

“Margin of error” sort of explained (+/-5%)

It is now September of an election year. Get ready for a two-month deluge of polls and commentary on them. One thing you can count on is reporters and pundits misinterpreting the meaning behind “margin of error.” This post is meant to simplify the concept.

Margin of error refers to sampling error and is present on every poll or market research survey. It can be mathematically calculated. All polls seek to figure out what everybody thinks by asking a small sample of people. There is always some degree of error in this.

The formula for margin of error is fairly simple and depends mostly on two things: how many people are surveyed and their variability of response. The more people you interview, the lower (better) the margin of error. The more the people you interview give the same response (lower variability), the better the margin of error. If a poll interviews a lot of people and they all seem to be saying the same thing, the margin of error of the poll is low. If the poll interviews a small number of people and they disagree a lot, the margin of error is high.

Most reporters understand that a poll with a lot of respondents is better than one with fewer respondents. But most don’t understand the variability component.

There is another assumption used in the calculation for sampling error as well: the confidence level desired. Almost every pollster will use a 95% confidence level, so for this explanation we don’t have to worry too much about that.

What does it mean to be within the margin of error on a poll? It simply means that the two percentages being compared can be deemed different from one another with 95% confidence. Put another way, if the poll was repeated a zillion times, we’d expect that at least 19 out of 20 times the two numbers would be different.

If Biden is leading Trump in a poll by 8 points and the margin of error is 5 points, we can be confident he is really ahead because this lead is outside the margin of error. Not perfectly confident, but more than 95% confident.

Here is where reporters and pundits mess it up.  Say they are reporting on a poll with a 5-point margin of error and Biden is leading Trump by 4 points. Because this lead is within the margin of error, they will often call it a “statistical dead heat” or say something that implies that the race is tied.

Neither is true. The only way for a poll to have a statistical dead heat is for the exact same number of people to choose each candidate. In this example the race isn’t tied at all, we just have a less than 95% confidence that Biden is leading. In this example, we might be 90% sure that Biden is leading Trump. So, why would anyone call that a statistical dead heat? It would be way better to be reporting the level of confidence that we have that Biden is winning, or the p-value of the result. I have never seen a reporter do that, but some of the election prediction websites do.

Pollsters themselves will misinterpret the concept. They will deem their poll “accurate” as long as the election result is within the margin of error. In close elections this isn’t helpful, as what really matters is making a correct prediction of what will happen.

Most of the 2016 final polls were accurate if you define being accurate as coming within the margin of error. But, since almost all of them predicted the wrong winner, I don’t think we will see future textbooks holding 2016 out there as a zenith of polling accuracy.

Another mistake reporters (and researchers make) is not recognizing that the margin of error only refers to sampling error which is just one of many errors that can occur on a poll. The poor performance of the 2016 presidential polls really had nothing to do with sampling error at all.

I’ve always questioned why there is so much emphasis on sampling error for a couple of reasons. First, the calculation of sampling error assumes you are working with a random sample which in today’s polling world is almost never the case. Second, there are many other types of errors in survey research that are likely more relevant to a poll’s accuracy than sampling error. The focus on sampling error is driven largely because it is the easiest error to mathematically calculate. Margin of error is useful to consider, but needs to be put in context of all the other types of errors that can happen in a poll.

How COVID-19 may change Market Research

Business life is changing as COVID-19 spreads in the US and the world. In the market research and insights field there will be both short-term and long-term effects. It is important that clients and suppliers begin preparing for them.

This has been a challenging post to write. First, in the context of what many people are going though in their personal and business lives as a result of this disruption, writing about what might happen to one small sector of the business world can come across as uncaring and tone-deaf, which is not the intention. Second, this is a quickly changing situation and this post has been rewritten a number of times in the past week. I have a feeling it may not age well.

Nonetheless, market research will be highly impacted by this situation. Below are some things we think will likely happen to the market research industry.

  • An upcoming recession will hit the MR industry hard. Market research is not an investment that typically pays off quickly. Companies that are forced to pare back will cut their research spending and likely their staffs.
  • Cuts will affect clients more than suppliers. In previous recessions, clients have cut MR staff and outsourced work to suppliers. This is an opportunity for suppliers that know their clients’ businesses well and can step up to help.
  • Unlike a lot of other types of industries, it is the large suppliers that are most at risk of losing work. Publicly-held research suppliers will be under even more intense pressure from their investors than usual. There will most certainly be cost cutting at these firms, and if the concerns over the virus persist, it will lead to layoffs.
  • The smallest suppliers could face an existential risk. Many independent contractors and small firms are dependent on one or two clients for the bulk of their revenue. If those clients are in highly affected sectors, these small suppliers will be at risk of going out of business.
  • Smallish to mid-sized suppliers may emerge stronger. Clients are going to be under cost pressures due to a receding economy and smaller research suppliers tend to be less expensive. Smaller research firms did well post 9/11 and during the recession of 2008-09 because clients moved work from higher priced larger firms to them. Smaller research firms would be wise to build tight relationships so that when the storm over the virus abates, they will have won their clients trust for future projects.
  • New small firms will emerge as larger firms cut staff and create refugees who will launch new companies.

Those are all items that might pertain to any sort of sudden business downturn. There are also some things that we think will happen that are specific to the COVID-19 situation:

  • Market research conferences will never be the same. Conferences are going to have difficulty drawing speakers and attendees. Down the line, conferences will be smaller and more targeted and there will be more virtual conferences and training sessions scheduled. At a minimum, companies will send fewer people to research conferences.
  • This will greatly affect MR trade associations as these conferences are important revenue sources for them. They will rethink their missions and revenue models, and will become less dependent on their signature events. The associations will have more frequent, smaller, more targeted online events. The days of the large, comprehensive research conference may be over.
  • Business travel will not return to its previous level. There will be fewer in-person meetings between clients and suppliers and those that are held will have fewer participants. Video conferencing will become an even more important way to reach clients.
  • Clients and suppliers will allow much more “work from home.” It may become the norm that employees are only expected to be in the office for key meetings. The situation with COVID-19 will give companies who don’t have a lot of experience allowing employees to work from home the opportunity to see the value in it. When the virus is under control, they will embrace telecommuting. We will see this crisis kick-start an already existing movement towards allowing more employees to work from home. The amount of office space needed will shrink.
  • Research companies will review and revise their sick-leave policies and there will be pressure on them to make them more generous.
  • Companies that did the right thing during the crisis will be rewarded with employee loyalty. Employees will become more attached and appreciative of suppliers that showed flexibility, did what they could to maintain payroll, and expressed genuine concerns for their employees.

Probably the biggest change we will see in market research projects is to qualitative research.

  • While there will always be great value in traditional, in-person focus groups , the situation around COVID-19 is going to cause online qualitative to become the standard approach. We are at a time where the technologies available for online qualitative are well-developed, yet clients and suppliers have clung to traditional methods. To date, the technology has been ahead of the demand. Companies will be forced by travel restrictions to embrace online methods and this will be at the expense of traditional groups. This is an excellent time to be in the online qualitative technology business. It is not such a great time to be in the focus group facility management business.
  • Independent moderators, who work exclusively with traditional groups, are going to be in trouble and not just in the short term. Many of these individuals will retire or look for work elsewhere or leave research. Others will necessarily adapt to online methods. Of course, there will continue to be independent moderators but we are predicting the demand for in-person groups will be permanently affected, and this portion of the industry will significantly shrink.
  • There is a risk that by not commissioning as much in-person qualitative, marketers may become further removed from direct human interaction with their customer base. This is a very real concern. We wouldn’t be in market research if we didn’t have an affinity for data and algorithms, but qualitative research is what keeps all of our efforts grounded. I’d caution clients to think carefully before removing all in-person interaction from your research plans.

What will happen to quantitative research? In the short-run, most studies will continue. Respondents are home, have free time, and thus far have shown they are willing to take part in studies. Some projects, typically in highly affected industries like travel and entertainment, are being postponed or canceled. All current data sets need to be viewed with a careful eye as the tumult around the virus can affect results. For instance, we conduct a lot of research with young respondents, and we now know for sure that their parents are likely nearby when they are taking our surveys, and that can influence our findings for some subjects.

Particular care needs to be taken in ongoing tracking studies. It makes sense for many trackers to add questions in to see how the situation has affected the brand in question.

But, in the longer term there will be too much change in quantitative research methods that result directly from this situation. If anything, there will be a greater need to understand consumers.

Tough times for sure. It has been heartening to see how our industry has reacted. Research panel and technology providers have reached out to help keep projects afloat. We’ve had subcontractors tell us we can delay payments if we need to. Calls with clients have become more “human” as we hear their kids and pets in the background and see the stresses they are facing. Respondents have continued to fill out our surveys.

There is a lot of uncertainty right now. At its core, market research is a way to reduce uncertainty for decision makers by making the future more predictable, so we are needed now more than ever. Research will adapt as it always does, and I believe in the long-run it may become even more valued as a result of this crisis.

The myth of the random sample

Sampling is at the heart of market research. We ask a few people questions and then assume everyone else would have answered the same way.

Sampling works in all types of contexts. Your doctor doesn’t need to test all of your blood to determine your cholesterol level – a few ounces will do. Chefs taste a spoonful of their creations and then assume the rest of the pot will taste the same. And, we can predict an election by interviewing a fairly small number of people.

The mathematical procedures that are applied to samples that enable us to project to a broader population all assume that we have a random sample. Or, as I tell research analysts: everything they taught you in statistics assumes you have a random sample. T-tests, hypotheses tests, regressions, etc. all have a random sample as a requirement.

Here is the problem: We almost never have a random sample in market research studies. I say “almost” because I suppose it is possible to do, but over 30 years and 3,500 projects I don’t think I have been involved in even one project that can honestly claim a random sample. A random sample is sort of a Holy Grail of market research.

A random sample might be possible if you have a captive audience. You can random sample some the passengers on a flight or a few students in a classroom or prisoners in a detention facility. As long as you are not trying to project beyond that flight or that classroom or that jail, the math behind random sampling will apply.

Here is the bigger problem: Most researchers don’t recognize this, disclose this, or think through how to deal with it. Even worse, many purport that their samples are indeed random, when they are not.

For a bit of research history, once the market research industry really got going the telephone random digit dial (RDD) sample became standard. Telephone researchers could randomly call land line phones. When land line telephone penetration and response rates were both high, this provided excellent data. However, RDD still wasn’t providing a true random, or probability sample. Some households had more than one phone line (and few researchers corrected for this), many people lived in group situations (colleges, medical facilities) where they couldn’t be reached, some did not have a land line, and even at its peak, telephone response rates were only about 70%. Not bad. But, also, not random.

Once the Internet came of age, researchers were presented with new sampling opportunities and challenges. Telephone response rates plummeted (to 5-10%) making telephone research prohibitively expensive and of poor quality. Online, there was no national directory of email addresses or cell phone numbers and there were legal prohibitions against spamming, so researchers had to find new ways to contact people for surveys.

Initially, and this is still a dominant method today, research firms created opt-in panels of respondents. Potential research participants were asked to join a panel, filled out an extensive demographic survey, and were paid small incentives to take part in projects. These panels suffer from three response issues: 1) not everyone is online or online at the same frequency, 2) not everyone who is online wants to be in a panel, and 3) not everyone in the panel will take part in a study. The result is a convenience sample. Good researchers figured out sophisticated ways to handle the sampling challenges that result from panel-based samples, and they work well for most studies. But, in no way are they a random sample.

River sampling is a term often used to describe respondents who are “intercepted” on the Internet and asked to fill out a survey. Potential respondents are invited via online ads and offers placed on a range of websites. If interested, they are typically pre-screened and sent along to the online questionnaire.

Because so much is known about what people are doing online these days, sampling firms have some excellent science behind how they obtain respondents efficiently with river sampling. It can work well, but response rates are low and the nature of the online world is changing fast, so it is hard to get a consistent river sample over time. Nobody being honest would ever use the term “random sampling” when describing river samples.

Panel-based samples and river samples represent how the lion’s share of primary market research is being conducted today. They are fast and inexpensive and when conducted intelligently can approximate the findings of a random sample. They are far from perfect, but I like that the companies providing them don’t promote them as being random samples. They involve some biases and we deal with these biases as best we can methodologically. But, too often we forget that they violate a key assumption that the statistical tests we run require: that the sample is random. For most studies, they are truly “close enough,” but the problem is we usually fail to state the obvious – that we are using statistical tests that are technically not appropriate for the data sets we have gathered.

Which brings us to a newer, shiny object in the research sampling world: ABS samples. ABS (addressed-based samples) are purer from a methodological standpoint. While ABS samples have been around for quite some time, they are just now being used extensively in market research.

ABS samples are based on US Postal Service lists. Because USPS has a list of all US households, this list is an excellent sampling frame. (The Census Bureau also has an excellent list, but it is not available for researchers to use.) The USPS list is the starting point for ABS samples.

Research firms will take the USPS list and recruit respondents from it, either to be in a panel or to take part in an individual study. This recruitment can be done by mail, phone, or even online. They often append publicly-known information onto the list.

As you might expect, an ABS approach suffers from some of the same issues as other approaches. Cooperation rates are low and incentives (sometimes large) are necessary. Most surveys are conducted online, and not everyone in the USPS list is online or has the same level of online access. There are some groups (undocumented immigrants, homeless) that may not be in the USPS list at all. Some (RVers, college students, frequent travelers) are hard to reach. There is evidence that ABS approaches do not cover rural areas as well as urban areas. Some households use post office boxes and not residential addresses for their mail. Some use more than one address. So, although ABS lists cover about 97% of US households, the 3% that they do not cover are not randomly distributed.

The good news is, if done correctly, the biases that result from an ABS sample are more “correctable” than those from other types of samples because they are measurable.

A recent Pew study indicates that survey bias and the number of bogus respondents is a bit smaller for ABS samples than opt-in panel samples.

But ABS samples are not random samples either. I have seen articles that suggest that of all those approached to take part in a study based on an ABS sample, less than 10% end up in the survey data set.

The problem is not necessarily with ABS samples, as most researchers would concur that they are the best option we have and come the closest to a random sample. The problem is that many firms that are providing ABS samples are selling them as “random samples” and that is disingenuous at best. Just because the sampling frame used to recruit a survey panel can claim to be “random” does not imply that the respondents you end up in a research database constitute a random sample.

Does this matter? In many ways, it likely does not. There are biases and errors in all market research surveys. These biases and errors vary not just by how the study was sampled, but also by the topic of the question, its tone, the length of the survey, etc. Many times, survey errors are not the same throughout an individual survey. Biases in surveys tend to be “unknown knowns” – we know they are there, but aren’t sure what they are.

There are many potential sources of errors in survey research. I am always reminded of a quote from Humphrey Taylor, the past Chairman of the Harris Poll who said “On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”  A few years ago, I wrote a post on biases and errors in research, and I was able to quickly name 15 of them before I even had to do an Internet search to learn more about them.

The reality is, the improvement in bias that is achieved by an ABS sample over a panel-based sample is small and likely inconsequential when considered next to the other sources of error that can creep into a research project. Because of this, and the fact that ABS sampling is really expensive, we tend to only recommend ABS panels in two cases: 1) if the study will result in academic publication, as academics are more accepting of data that comes from and ABS approach, and 2) if we are working in a small geography, where panel-based samples are not feasible.

Again, ABS samples are likely the best samples we have at this moment. But firms that provide them are often inappropriately portraying them as yielding random samples. For most projects, the small improvements in bias they provide is not worth the considerable increased budget and increased study time frame, which is why, for the moment, ABS samples are currently used in a small proportion of research studies. I consider ABS to be “state of the art” with the emphasis on “art” as sampling is often less of a science than people think.

Will adding a citizenship question to the Census harm the Market Research Industry?

The US Supreme Court appears likely to allow the Department of Commerce to reinstate a citizenship question on the 2020 Census. This is largely viewed as a political controversy at the moment. The inclusion of a citizenship question has proven to dampen response rates among non-citizens, who tend to be people of color. The result will be gains in representation for Republicans at the expense of Democrats (political district lines are redrawn every 10 years as a result of the Census). Federal funding will likely decrease for states with large immigrant populations.

It should be noted that the Census bureau itself has come out against this change, arguing that it will result in an undercount of about 6.5 million people. Yet, the administration has pressed forward and has not committed funds needed by the Census Bureau to fully research the implications. The concern isn’t just about non-response from non-citizens. In tests done by the Census Bureau, non-citizens are also more likely to inaccurately respond to this question than citizens, meaning the resulting data will be inaccurate.

Clearly this is a hot-button political issue. However, there is not much talk of how this change may affect research. Census data are used to calibrate most research studies in the US, including academic research, social surveys, and consumer market research. Changes to the Census may have profound effects on data quality.

The Census serves as a hidden backbone for most research studies whether researchers or clients realize it or not. Census information helps us make our data representative. In a business climate that is becoming more and more data-driven the implications of an inaccurate Census are potentially dire.

We should be primarily concerned that the Census is accurate regardless of the political implications. Adding questions that temper response will not help accuracy. Errors in the Census have a tendency to become magnified in research. For example, in new product research it is common to project study data from about a thousand respondents to a universe of millions of potential consumers. Even a small error in the Census numbers can lead businesses to make erroneous investments. These errors create inefficiencies that reverberate throughout the economy. Political concerns aside, US businesses undoubtably suffer from a flawed Census. Marketing becomes less efficient.

All is not lost though. We can make a strong case that there are better, less costly ways to conduct the Census. Methodologists have long suggested that a sampling approach would be more accurate than the current attempt at enumeration. This may never happen for the decennial Census because the Census methodology is encoded in the US Constitution and it might take an amendment to change it.

So, what will happen if this change is made? I suspect that market research firms will switch to using data that come from the Census’ survey programs, such as the American Community Survey (ACS). Researchers will rely less on the actual decennial census. In fact, many research firms already use the ACS rather than the decennial census (and the ACS currently contains the citizenship question).

The Census bureau will find ways to correct for resulting error, and to be honest, this may not be too difficult from a methodological standpoint. Business will adjust because there will be economic benefits to learning how to deal with a flawed Census, but in the end, this change will take some time for the research industry to address. Figuring things like this out is what good researchers do. While it is unfortunate that this change looks likely to be made, its implications are likely more consequential politically than it will be to the research field.

Long Live the Focus Group!

Market research has changed over the past two decades. Telephone research has faded away, mail studies are rarely considered, and younger researchers have likely never conducted a central location test in a mall. However, there is an old-school type of research that has largely survived this upheaval:  the traditional, in-person focus group.

There has been extensive technological progress in qualitative research. We can now conduct groups entirely online, in real-time, with participants around the globe. We can conduct bulletin board style online groups that take place over days. Respondents can respond via text or live video, can upload assignments we give them, and can take part in their own homes or workplaces. We can intercept them when they enter a store and gather insights “in the moment.” We even use technology to help make sense of the results, as text analytics has come a long way and is starting to prove its use in market research.

These new, online qualitative approaches are very useful. They save on travel costs, can be done quickly, and are often less expensive than traditional focus groups. But we have found that they are not a substitute for traditional focus groups, at least not in the way that online surveys have substituted for telephone surveys. Instead, online qualitative techniques are new tools that can do new things, but traditional focus groups are still the preferred method for many projects.

There is just no real substitute for the traditional focus group that allows clients to see actual customers interact around their product or issue. In some ways, as our world has become more digital traditional focus groups provide a rare opportunity to see and hear from customers. They are often the closest clients get to actually seeing their customers in a live setting.

I’ve attended hundreds of focus groups. I used to think that the key to a successful focus group was the skill of the moderator followed by a cleverly designed question guide. Clients spend a lot of time on the question guide. But they spend very little time on something that is critical to every group’s success: the proper screening of participants.

Seating the right participants is every bit as important as constructing a good question guide. Yet, screening is given passing attention by researchers and clients. Typically, once we decide to conduct groups a screener is turned around within a day because we need to get moving on the recruitment. In contrast, a discussion guide is usually developed over a full week or two.

Developing an outstanding screener starts by having a clear sense of objectives. What decisions are being made as a result of the project? Who is making them? What is already known? How will the decision path differ based on what we find? I am always surprised that in probably half of our qualitative projects our clients don’t have answers to these questions.

Next, it is important to remind clients that focus groups are qualitative research and we shouldn’t be attempting to gather a “representative” sample. Focus groups happen with a limited number of participants in a handful of cities and we shouldn’t be trying to project findings to a larger audience. If that is needed, a follow-up quantitative phase is required. Instead, in groups we are trying to delve deeply into motivations, explore ideas, and develop with new hypotheses we can test later.

It is a common mistake to try to involve enough participants to make findings “valid.” This is important, as we are looking for thoughtful participants and not necessarily “typical” customers. We want folks that will expand our knowledge of a subject and of customers will help us explore deeply into topics and develop new lines of inquiry we haven’t considered.

“Representative” participants can be quiet and reserved and not necessarily useful to this phase of research. For this reason, we always use articulation screening questions which raise the odds that we will get a talkative participant who enjoys sharing his/her opinions.

An important part of the screening process is determining how to segment the groups. It is almost never a good idea to hold all of your sessions with the same audience. We tend to segment on age, potentially gender, and often by the participants’ experience level with the product or issue. Contrasting findings from these groups is often where the key qualitative insights lie.

It is also necessary to over-recruit. Most researchers overrecruit to protect against participants who fail to show up to the sessions. We do it for another reason. We like to have a couple of extra participants in the waiting area. Before the groups start, the moderator spends some time with them. This accomplishes two things. First, the groups are off and running the moment participants enter the focus group room because a rapport with the moderator has been established. Second, spending a few minutes with participants before groups begin allows the moderator to determine in advance which participants are going to be quiet or difficult, and allows us to pay them the incentive and send them home.

Clients tend to insist on group sizes that are too large. I have viewed groups with as many as 12 respondents. Even in a two-hour session, the average participant will be talking for just 10 minutes in this case and that is if there are no silences or the moderator doesn’t talk! In reality, with 12 participants you will get maybe five minutes out of each one. How is that useful?

Group dynamics are different in smaller groups. We like to target having about six participants. This group size is small enough that all must participate and engage, but large enough to get a diversity of views.  We also prefer to have groups run for 90 minutes or less.

We like to schedule some downtime in between groups. The moderator needs this to recharge (and eat!), but this also gives time for a short debrief and to adjust the discussion guide on the fly. I have observed groups where the moderator is literally doing back-to-back sessions for six hours and it isn’t productive. Similarly, it is ideal to have a rest day in between cities to regroup to provide an opportunity to develop new questions. (Although, this is rarely done in practice.)

Clients also need to learn to leave the moderator alone for at least 30 minutes before the first group begins. Moderating is stressful, even for moderators who have led thousands of groups. They need time to review the guide and converse with the participants. Too many times, clients are peppering the moderator with last second changes to the guide and in general are stressing the moderator right before the first session. These discussions need to be held before focus group day.

We’d also caution against conducting too many groups. I remember working on a proposal many years ago when our qualitative director was suggesting we conduct 24 focus groups. She was genuinely angry at me when I asked her “what are we going to learn in that 24th group that we didn’t learn in the first 23?”.

In all candor, in my experience you learn about 80% of what you will learn in the first evening of groups. It is useful to conduct another evening or two to confirm what you have heard. But it is uncommon for a new insight arises after the first few groups. It is a rare project that needs more than about two cities’ worth of groups.

It is also critical to have the right people from the clients attending the sessions. With the right people present discussions behind the mirror become insightful and can be the most important part of the project. Too often, clients send just one or two people from the research team and the internal decision makers stay home. I have attended groups where the client hasn’t shown up at all and it is just the research supplier who is there. If the session isn’t important enough to send decision makers to attend, it probably isn’t important enough to be doing in the first place.

I have mixed feelings about live streaming sessions. This can be really expensive and watching the groups at home is not the same as being behind the mirror with your colleagues. Live streaming is definitely better than not watching them at all. But I would say about half the time our clients pay for live streaming nobody actually logs in to watch them.

Focus groups are often a lead-in to a quantitative study. We typically enter into the groups with an outline of the quantitative questionnaire at the ready. We listen purposefully at the sessions to determine how we need to refine our questionnaire. This is more effective than waiting for the qualitative to be over before starting the quantitative design. We can usually have the quant questionnaire ready for review before the report for the groups is available because we take this approach.

Finally, it is critical to debrief at the end of each evening. This is often skipped. Everyone is tired, has been sitting in the dark for hours, and have to get back to a hotel and get up early for a flight. But, a quick discussion to agree on the key takeaways while they are fresh in mind is very helpful. We try to get clients to agree to these debriefings before the groups are held.

Traditional groups provide more amazing moments and unexpected insights than any other research method. I think this may be why, despite all the new options for qualitative, clients are conducting just as many focus groups as ever.

Has market research become Big Brother?

Technological progress has disrupted market research. Data are available faster and cheaper than ever before. Many traditional research functions have been automated out of existence or have changed significantly because of technology. Projects take half the time to complete that they did just a decade ago. Decision making has moved from an art to a science. Yet, as with most technological disruptions, there are just as many potential pitfalls as efficiencies to be wary of as technology changes market research.

“Passive” data collection is one of these potential pitfalls. It is used by marketers in good ways: the use of passive data helps understand consumers better, target meaningful products and services, and create value for both the consumer and the marketer. However, much of what is happening with passive data collection is done without the full knowledge of the consumer and this process has the potential of being manipulative. The likelihood of backlash towards the research industry is high.

The use of passive data in marketing and research is new and many researchers may not know what is happening so let us explain. A common way to obtain survey research respondents is to tap into large, opt-in online panels that have been developed by a handful of companies. These panels are often augmented with social (river) channels whereby respondents are intercepted while taking part in various online activities. A recruitment email or text is delivered, respondents take a survey, and data are analyzed. Respondents provide information actively and with full consent.

There have been recent mergers which have resulted in fewer but larger and more robust online research panels available. This has made it feasible for some panel companies to gain the scale necessary to augment this active approach with passive data.

It is possible to append information from all sorts of sources to an online panel database. For instance, voter registration files are commonly appended. If you are in one of these research panels, clients likely know if you are registered to vote, if you actually voted, and your political party association. They will have made a prediction of how strong a liberal or conservative you likely are. They may have even run models to predict which issues you care most about. You are likely linked into a PRIZM cluster that associates you with characteristics of the neighborhood where you reside, which in turn can score your potential to be interested in all sorts of product categories. This is all in your file.

These panels also have the potential to link to other publicly-available databases such as car registration files, arrest records, real estate transactions, etc. If you are in these panels, whether you have recently bought a house, how much you paid for it, if you have been convicted of a crime, may all be in your “secret file.”

But, it doesn’t stop there. These panels are now cross-referenced to other consumer databases. There are databases that gather the breadcrumbs you leave behind in your digital life: sites you are visiting, ads you have been served, and even social media posts you have made. There is a tapestry of information available that is far more detailed than most consumers realize. From the research panel company’s perspective, it is just a matter of linking that information to their panel.

This opens up exciting research possibilities. We can now conduct a study among people who are verified to have been served by a specific client’s digital advertising. We can refine our respondent base further by those who are known to have clicked on the ad. As you can imagine, this can take ad effectiveness research to an entirely different level. It is especially interesting to clients because it can help optimize media spending which is by far the largest budget item for most marketing departments.

But, therein lies the ethical problem. Respondents, regardless of what privacy policies they may have agreed to, are unlikely to know that their passive web behavior is being linked into their survey responses. This alone should ring alarm bells for an industry suffering from low response rates and poor data quality. Respondents are bound to push back when they realize there is a secret file panel companies are holding on them.

Panel companies are straying from research into marketing. They are starting to encourage clients to use the survey results to better target individual respondents in direct marketing. This process can close a loop with a media plan. So, say on a survey you report that you prefer a certain brand of a product. That can now get back to you and you’ll start seeing ads for that product, likely without your knowledge that this is happening because you took part in a survey.

To go even further, this can affect advertising people not involved in the survey may see. If you prefer a certain brand and I profile a lot like you, as a result of your participation in a survey I may end up seeing specific ads. Even if I don’t know you or have any connection to you.

In some ways, this reeks of the Cambridge Analytica scandal (which we explain in a blog post here). We’ll be surprised if this practice doesn’t eventually create a controversy in the survey research industry. This sort of sales targeting resulting from survey participation will result in lower response rates and a further erosion of confidence in the market research field. However, it is also clear that these approaches are inevitable and will be used more and more as panel companies and clients gain experience with them.

It is the blurring of the line between marketing and market research that has many old-time researchers nervous. There is a longstanding ethical tenet in the industry that participation in research project should in no way result in the respondent being sold or marketed to. The term for this is SUGGING (Selling Under the Guise of research) and all research industry trade groups have a prohibition against SUGGING embedded in their codes of ethics. It appears that some research firms are ignoring this. But, this concept has always been central to the market research field: we have traditionally assured respondents that they can be honest on our surveys because we will in no way market to them directly because of their answers.

In the novel 1984 George Orwell describes a world where the government places its entire civilization under video surveillance. For most of the time since its publication, this has appeared as a frightening, far-fetched cautionary tale. Recent history has suggested this world may be upon us. The NSA scandal (precipitated by Edward Snowden) showed how much of our passive information is being shared with the government without our knowledge. Rather than wait for the government to surveil the population, we’ve turned the cameras on ourselves. Marketers can do things I don’t feel people realize and research respondents are unknowingly enabling this. The contrails you leave as you simply navigate your life online can be used to follow you and the line between research and marketing is fading, and this will eventually be to the detriment of our field.

Market research isn’t about storytelling, it is about predicting the future

We recently had a situation that made me question the credibility of market research. We had fielded a study for a long-term client and were excited to view the initial version of the tabs. As we looked at results by age groupings we found them to be surprising. But this was also exciting because we were able to weave a compelling narrative around why the age results seemed counter-intuitive.

Then our programmer called to say a mistake had been made in the tabs and the banner points by age had been mistakenly reversed.

So, we went back to the drawing board ad constructed another, equally compelling story, as to why the data were behaving as they were.

This made me question the value of research. Good researchers can review seemingly disparate data points from a study and generate a persuasive story as to why they are as they are. Our entire business is based on this skill – in the end clients pay us to use data to provide insight into their marketing issues. Everything else we do is a means to this end.

Our experience with the flipped age banner points illustrates that stories can be created around any data. In fact, I’d bet that if you gave us a randomly-generated data set we could convince you as to its relevance to your marketing issues. I actually thought about doing this – taking the data we obtain by running random data through a questionnaire when testing it before fielding, handing it to an analyst, and seeing what happens. I’m convinced we could show you a random data set’s relevance to your business.

This issue is at the core of polling’s PR problem. We’ve all heard people say that you can make statistics say anything, therefore polls can’t be trusted. There are lies, damn lies, and statistics. I’ve argued against this for a long time because the pollsters and researchers I have known have universally been well-intentioned and objective and never try to draw a pre-determined conclusion from the data.

Of course, this does not mean that all of the stories we tell with data aren’t correct or enlightening. But, they all come from a perspective. Clients value external suppliers because of this perspective – we are third-party observers who aren’t wrapped up in the internal issues client’s face and we are often in a good position to view data with an objective mind. We’ve worked with hundreds of organizations and can bring these experiences bring that to bear on your study. Our perspective is valuable.

But, it is this perspective that creates an implicit bias in all we do. You will assess a data set from a different set of life experiences and background than I will. That is just human nature. Like all biases in research, our implicit bias may or not be relevant to a project. In most cases, I’d say it likely isn’t.

So, how can researchers reconcile this issue and sleep at night knowing their careers haven’t been a sham?

First and foremost, we need to stop saying that research is all about storytelling. It isn’t. The value of market research isn’t in the storytelling it is in the predictions of the future it makes. Clients aren’t paying us to tell them stories. They are paying us to predict the future and recommend actions that will enhance their business. Compelling storytelling is a means to this but is not our end goal. Data-based storytelling provides credibility to our predictions and gives confidence that they have a high probability of being correct.

In some sense, it isn’t the storytelling that matters, it is the quality of the prediction. I remember having a college professor lecturing on this. He would say that the quality of a model is judged solely by its predictive value. Its assumptions, arguments, and underpinnings really didn’t matter.

So, how do we deal with this issue … how do we ensure that the stories we tell with data are accurate and fuel confident predictions? Below are some ideas.

  1. Make predictions that can be validated at a later date. Provide a level of confidence or uncertainty around the prediction. Explain what could happen to prevent your prediction from coming true.
  2. Empathize with other perspectives when analyzing data. One of the best “tricks” I’ve ever seen is to re-write a research report as if you were writing it for your client’s top competitor. What conclusions would you draw for them? If it is an issue-based study, consider what you would conclude from the data if your client was on the opposite side of the issue.
  3. Peg all conclusions to specific data points in the study. Straying from the data is where your implicit bias may tend to take over. Being able to tie conclusions directly to data is dependent on solid questionnaire design.
  4. Have a second analyst review your work and play devil’s advocate. Show him/her the data without your analysis and see what stories and predictions he/she can develop independent of you. Have this same person review your story and conclusions and ask him/her to try to knock holes in them. The result is a strengthened argument.
  5. Slow down. It just isn’t possible to provide stories, conclusions, and predictions from research data that consider differing perspectives when you have just a couple of days to do it. This requires more negotiation upfront as to project timelines. The ever-decreasing timeframes for projects are making it difficult to have the time needed to objectively look at data.
  6. Realize that sometimes a story just isn’t there. Your perspective and knowledge of a client’s business should result in a story leaping out at you and telling itself. If this doesn’t happen, it could be because the study wasn’t designed well or perhaps there simply isn’t a story to be told. The world can be a more random place than we like to admit, and not everything you see in a data set is explainable. Don’t force it – developing a narrative that is reaching for explanations is inaccurate and a disservice to your client.

Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.