Archive for the 'Methodology' Category

Is segmentation just discrimination with an acceptable name?

A short time ago we posted a basic explanation of the Cambridge Analytica/Facebook scandal (which you can read here). In it, we stated that market segmentation and stereotyping are essentially the same thing. This presents an ethical quandary for marketers as almost every marketing organization makes heavy use of market segmentation.

To review, marketers place customers into segments so that they can better understand and serve them. Segmentation is at the essence of marketing. Segments can be created along any measurable dimension, but since almost all segments have a demographic component we will focus on that for this post.

It can be argued that segmentation and stereotyping are the same thing. Stereotyping is attaching perceived group characteristic to an individual. For instance, if you are older I might assume your political views lean conservative, since it is known that political views tend to be more conservative in older Americans that they are in general among younger Americans. If you are female I might assume you are more likely to be the primary shopper for your household, since females in total do more of the family shopping than males. If you are African-American, I might assume you have a higher likelihood than others to listen to rap music, since that genre indexes high among African-Americans.

These are all stereotypes. These examples can be shown to true of a larger group, but that doesn’t necessarily imply that they apply to all the individuals in the group. There are plenty of liberal older Americans, females who don’t shop at all, and African-Americans who can’t stand rap music.

Segmenting consumers (which is applying stereotypes) isn’t inherently a bad thing. It leads to customized products and better customer experiences. The potential problem isn’t with stereotyping, it is when doing so moves to a realm of being discriminatory that we have to be careful. As marketers we tread a fine line. Stereotyping oversimplifies the complexity of consumers by forming an easy to understand story. This is useful in some contexts and discriminatory in others.

Some examples are helpful. It can be shown that African-Americans have a lower life expectancy than Whites. A life insurance company could use this information to charge African-Americans higher premiums than Whites. (Indeed, many insurance companies used to do this until various court cases prevented them from doing so.) This is a segmentation practice that many would say crosses a line to become discriminatory.

In a similar vein, car insurance companies routinely charge higher risk groups (for example younger drivers and males) higher rates than others. That practice has held up as not being discriminatory from a legal standpoint, largely because the discrimination is not against a traditionally disaffected group.

At Crux, we work with college marketers to help them make better admissions offer decisions. Many colleges will document the characteristics of their admitted students who thrive and graduate in good standing. The goal is to profile these students and then look back at how they profiled as applicants. The resulting model can be used to make future admissions decisions. Prospective student segments are established that have high probabilities of success at the institution because they look like students known to be successful, and this knowledge is used to make informed admissions offer decisions.

However, this is a case where a segmentation can cross a line and become discriminatory. Suppose that the students who succeed at the institution tend to be rich, white, female, and from high performing high schools. By benchmarking future admissions offers against them, an algorithmic bias is created. Fewer minorities, males, and students from urban districts will be extended admissions offers What turns out to be a good model from a business standpoint ends up perpetuating a bias., and places certain demographics of students at a further disadvantage.

There is a burgeoning field in research known as “predictive analytics.” It allows data jockeys to use past data and artificial intelligence to make predictions on how consumers will react. It is currently mostly being used in media buying. Our view is it helps in media efficiency, but only if the future world can be counted on to behave like the past. Over-reliance on predictive analytics will result in marketers missing truly breakthrough trends. We don’t have to look further than the 2016 election to see how it can fail; many pollsters were basing their modeling on how voters had performed in the past and in the process missed a fundamental shift in voter behavior and made some very poor predictions.

That is perhaps an extreme case, but shows that segmentations can have unintended consequences. This can happen in consumer product marketing as well. Targeted advertising can become formulaic. Brands can decline distribution in certain outlets. Ultimately, the business can suffer and miss out on new trends.

Academics (most notably Kahneman and Tversky) have established that people naturally apply heuristics to decision making. These are “rules of thumb” that are often useful because they allow us to make decisions quickly. However, these academics have also demonstrated how the use of heuristics often result in sub-optimal and biased decision making.

This thinking applies to segmentation. Segmentation allows us to make marketing decisions quickly because we assume that individuals take on the characteristics of a larger group. But, it ignores the individual variability within the group, and often that is where the true marketing insight lies.

We see this all the time in the generational work we do. Yes, Millennials as a group tend to be a bit sheltered, yet confident and team-oriented. But this does not mean all of them fit the stereotype. In fact, odds are high that if you profile an individual from the Millennial generation, he/she will only exhibit a few of the characteristics commonly attributed to the generation. Taking the stereotype too literally can lead to poor decisions.

This is not to say that marketers shouldn’t segment their customers. This is a widespread practice that clearly leads to business results. But, they should do so considering the errors and biases applying segments can create, and think hard about whether this can unintentionally discriminate and, ultimately, harm the business in the long term.

Has market research become Big Brother?

Technological progress has disrupted market research. Data are available faster and cheaper than ever before. Many traditional research functions have been automated out of existence or have changed significantly because of technology. Projects take half the time to complete that they did just a decade ago. Decision making has moved from an art to a science. Yet, as with most technological disruptions, there are just as many potential pitfalls as efficiencies to be wary of as technology changes market research.

“Passive” data collection is one of these potential pitfalls. It is used by marketers in good ways: the use of passive data helps understand consumers better, target meaningful products and services, and create value for both the consumer and the marketer. However, much of what is happening with passive data collection is done without the full knowledge of the consumer and this process has the potential of being manipulative. The likelihood of backlash towards the research industry is high.

The use of passive data in marketing and research is new and many researchers may not know what is happening so let us explain. A common way to obtain survey research respondents is to tap into large, opt-in online panels that have been developed by a handful of companies. These panels are often augmented with social (river) channels whereby respondents are intercepted while taking part in various online activities. A recruitment email or text is delivered, respondents take a survey, and data are analyzed. Respondents provide information actively and with full consent.

There have been recent mergers which have resulted in fewer but larger and more robust online research panels available. This has made it feasible for some panel companies to gain the scale necessary to augment this active approach with passive data.

It is possible to append information from all sorts of sources to an online panel database. For instance, voter registration files are commonly appended. If you are in one of these research panels, clients likely know if you are registered to vote, if you actually voted, and your political party association. They will have made a prediction of how strong a liberal or conservative you likely are. They may have even run models to predict which issues you care most about. You are likely linked into a PRIZM cluster that associates you with characteristics of the neighborhood where you reside, which in turn can score your potential to be interested in all sorts of product categories. This is all in your file.

These panels also have the potential to link to other publicly-available databases such as car registration files, arrest records, real estate transactions, etc. If you are in these panels, whether you have recently bought a house, how much you paid for it, if you have been convicted of a crime, may all be in your “secret file.”

But, it doesn’t stop there. These panels are now cross-referenced to other consumer databases. There are databases that gather the breadcrumbs you leave behind in your digital life: sites you are visiting, ads you have been served, and even social media posts you have made. There is a tapestry of information available that is far more detailed than most consumers realize. From the research panel company’s perspective, it is just a matter of linking that information to their panel.

This opens up exciting research possibilities. We can now conduct a study among people who are verified to have been served by a specific client’s digital advertising. We can refine our respondent base further by those who are known to have clicked on the ad. As you can imagine, this can take ad effectiveness research to an entirely different level. It is especially interesting to clients because it can help optimize media spending which is by far the largest budget item for most marketing departments.

But, therein lies the ethical problem. Respondents, regardless of what privacy policies they may have agreed to, are unlikely to know that their passive web behavior is being linked into their survey responses. This alone should ring alarm bells for an industry suffering from low response rates and poor data quality. Respondents are bound to push back when they realize there is a secret file panel companies are holding on them.

Panel companies are straying from research into marketing. They are starting to encourage clients to use the survey results to better target individual respondents in direct marketing. This process can close a loop with a media plan. So, say on a survey you report that you prefer a certain brand of a product. That can now get back to you and you’ll start seeing ads for that product, likely without your knowledge that this is happening because you took part in a survey.

To go even further, this can affect advertising people not involved in the survey may see. If you prefer a certain brand and I profile a lot like you, as a result of your participation in a survey I may end up seeing specific ads. Even if I don’t know you or have any connection to you.

In some ways, this reeks of the Cambridge Analytica scandal (which we explain in a blog post here). We’ll be surprised if this practice doesn’t eventually create a controversy in the survey research industry. This sort of sales targeting resulting from survey participation will result in lower response rates and a further erosion of confidence in the market research field. However, it is also clear that these approaches are inevitable and will be used more and more as panel companies and clients gain experience with them.

It is the blurring of the line between marketing and market research that has many old-time researchers nervous. There is a longstanding ethical tenet in the industry that participation in research project should in no way result in the respondent being sold or marketed to. The term for this is SUGGING (Selling Under the Guise of research) and all research industry trade groups have a prohibition against SUGGING embedded in their codes of ethics. It appears that some research firms are ignoring this. But, this concept has always been central to the market research field: we have traditionally assured respondents that they can be honest on our surveys because we will in no way market to them directly because of their answers.

In the novel 1984 George Orwell describes a world where the government places its entire civilization under video surveillance. For most of the time since its publication, this has appeared as a frightening, far-fetched cautionary tale. Recent history has suggested this world may be upon us. The NSA scandal (precipitated by Edward Snowden) showed how much of our passive information is being shared with the government without our knowledge. Rather than wait for the government to surveil the population, we’ve turned the cameras on ourselves. Marketers can do things I don’t feel people realize and research respondents are unknowingly enabling this. The contrails you leave as you simply navigate your life online can be used to follow you and the line between research and marketing is fading, and this will eventually be to the detriment of our field.

Market research isn’t about storytelling, it is about predicting the future

We recently had a situation that made me question the credibility of market research. We had fielded a study for a long-term client and were excited to view the initial version of the tabs. As we looked at results by age groupings we found them to be surprising. But this was also exciting because we were able to weave a compelling narrative around why the age results seemed counter-intuitive.

Then our programmer called to say a mistake had been made in the tabs and the banner points by age had been mistakenly reversed.

So, we went back to the drawing board ad constructed another, equally compelling story, as to why the data were behaving as they were.

This made me question the value of research. Good researchers can review seemingly disparate data points from a study and generate a persuasive story as to why they are as they are. Our entire business is based on this skill – in the end clients pay us to use data to provide insight into their marketing issues. Everything else we do is a means to this end.

Our experience with the flipped age banner points illustrates that stories can be created around any data. In fact, I’d bet that if you gave us a randomly-generated data set we could convince you as to its relevance to your marketing issues. I actually thought about doing this – taking the data we obtain by running random data through a questionnaire when testing it before fielding, handing it to an analyst, and seeing what happens. I’m convinced we could show you a random data set’s relevance to your business.

This issue is at the core of polling’s PR problem. We’ve all heard people say that you can make statistics say anything, therefore polls can’t be trusted. There are lies, damn lies, and statistics. I’ve argued against this for a long time because the pollsters and researchers I have known have universally been well-intentioned and objective and never try to draw a pre-determined conclusion from the data.

Of course, this does not mean that all of the stories we tell with data aren’t correct or enlightening. But, they all come from a perspective. Clients value external suppliers because of this perspective – we are third-party observers who aren’t wrapped up in the internal issues client’s face and we are often in a good position to view data with an objective mind. We’ve worked with hundreds of organizations and can bring these experiences bring that to bear on your study. Our perspective is valuable.

But, it is this perspective that creates an implicit bias in all we do. You will assess a data set from a different set of life experiences and background than I will. That is just human nature. Like all biases in research, our implicit bias may or not be relevant to a project. In most cases, I’d say it likely isn’t.

So, how can researchers reconcile this issue and sleep at night knowing their careers haven’t been a sham?

First and foremost, we need to stop saying that research is all about storytelling. It isn’t. The value of market research isn’t in the storytelling it is in the predictions of the future it makes. Clients aren’t paying us to tell them stories. They are paying us to predict the future and recommend actions that will enhance their business. Compelling storytelling is a means to this but is not our end goal. Data-based storytelling provides credibility to our predictions and gives confidence that they have a high probability of being correct.

In some sense, it isn’t the storytelling that matters, it is the quality of the prediction. I remember having a college professor lecturing on this. He would say that the quality of a model is judged solely by its predictive value. Its assumptions, arguments, and underpinnings really didn’t matter.

So, how do we deal with this issue … how do we ensure that the stories we tell with data are accurate and fuel confident predictions? Below are some ideas.

  1. Make predictions that can be validated at a later date. Provide a level of confidence or uncertainty around the prediction. Explain what could happen to prevent your prediction from coming true.
  2. Empathize with other perspectives when analyzing data. One of the best “tricks” I’ve ever seen is to re-write a research report as if you were writing it for your client’s top competitor. What conclusions would you draw for them? If it is an issue-based study, consider what you would conclude from the data if your client was on the opposite side of the issue.
  3. Peg all conclusions to specific data points in the study. Straying from the data is where your implicit bias may tend to take over. Being able to tie conclusions directly to data is dependent on solid questionnaire design.
  4. Have a second analyst review your work and play devil’s advocate. Show him/her the data without your analysis and see what stories and predictions he/she can develop independent of you. Have this same person review your story and conclusions and ask him/her to try to knock holes in them. The result is a strengthened argument.
  5. Slow down. It just isn’t possible to provide stories, conclusions, and predictions from research data that consider differing perspectives when you have just a couple of days to do it. This requires more negotiation upfront as to project timelines. The ever-decreasing timeframes for projects are making it difficult to have the time needed to objectively look at data.
  6. Realize that sometimes a story just isn’t there. Your perspective and knowledge of a client’s business should result in a story leaping out at you and telling itself. If this doesn’t happen, it could be because the study wasn’t designed well or perhaps there simply isn’t a story to be told. The world can be a more random place than we like to admit, and not everything you see in a data set is explainable. Don’t force it – developing a narrative that is reaching for explanations is inaccurate and a disservice to your client.

The Cambridge Analytica scandal points to marketing’s future

There has been a lot of press, almost universally bad, regarding Cambridge Analytica recently. Most of this discussion has centered on political issues (how their work may have benefitted the Trump campaign) and on data privacy issues (how this scandal has shined a light on the underpinnings of Facebook’s business model). One thing that hasn’t been discussed is the technical brilliance of this approach to combining segmentation, big data, and targeted communications to market effectively. In the midst of an incredibly negative PR story lurks the story of a controversial future of market research and marketing.

To provide a cursory and perhaps oversimplified recap of what happened, this all began with a psychographic survey which provided input into a segmentation. This is a common type of market research project. Pretty much every brand you can think of has done it. The design usually has a basis in psychology and the end goal is typically to create subgroups of consumers that provide a better customer understanding and ultimately help a client spend marketing resources more efficiently by targeting these subgroups.

Almost every marketer targets demographically – by easy to identify characteristics such as age, gender, race/ethnicity, and geography. Many also target psychographic ally – by personality characteristics and deeper psychological constructs. The general approach taken by Cambridge Analytics has been perfected over decades and is hardly new. I’d say I’ve been involved in about 100 projects that involve segmenting on a psychographic basis.

To give a concrete example, this type of approach is used by public health campaigns seeking to minimize drug and alcohol use. Studies will be done on a demographic basis that indicate things like drug use skews towards males more than females, towards particular age groups, and perhaps even certain regions of the country. But, it can also be shown that those most at risk of addiction also have certain personality types – they are risk takers, sensation seekers, extroverts, etc. Combined with demographic information, this can allow a public health marketer to target their marketing spend as well as help them craft messages that will resound with those most at risk.

Segmentation is essentially stereotyping with another name. It is associating perceived characteristics of a group with an individual. At its best, this approach can provide the consumer with relevant marketing and products customized to his/her needs. At its worst, it can ignore variation within a group and devalue the consumer as an individual. Segmentation can turn to prejudice and profiling fast and marketers can put too much faith in it.

Segmentation is imperfect. Just because you are a male, aged 15-17, and love to skateboard without a helmet and think jumping out of an airplane would be cool does not necessarily mean you are at risk to initiate drug use. But, our study might show that for every 100 people like you, 50 of them are at risk, and that is enough to merit spending prevention money towards reaching you. You might not be at risk for drug use, but we think you have a 50% chance of being so and this is much higher than the general risk in the population. This raises the efficiency of marketing spending.

What Cambridge Analytica did was analogous to this. The Facebook poll users completed provided data needed to establish segments. These segments were then used to predict your likelihood to care about an issue. Certain segments might be more associated with hot button issues in the election campaign, say gun rights, immigration, loss of American jobs, or health care. So, once you filled out the survey, combined with demographic data, it became possible to “score” you on these issues. You might not be a “gun nut” but your data can provide the researcher with the probability that you are, and if it is high enough you might get an inflammatory gun rights ad targeted to you.

Where this got controversial was, first and foremost, regardless of what Facebook’s privacy policy may say, most users had no clue that answering an innocuous quiz might enable them to be targeted in this way. Cambridge Analytica had more than the psychographic survey at their disposal – they also had demographics, user likes and preferred content, and social connections. They also had much of this information on the user’s Facebook friends as well. It is the depth of the information they gathered than has led to the crisis at Facebook.

People tend to associate most strongly with people who are like them. So, if I score you high on a “gun nut scale” chances are reasonably high that your close friends will have a high probability of being like you. So, with access to your friends, a marketer can greatly expand the targeted reach of the campaign.

It is hard to peel away from the controversies to see how this story really points to the future of marketing, and how research will point the way. Let me explain.

Most segmentations suffer from a fatal flaw: they segment with little ability to follow up by targeting. With a well-crafted survey we can almost always create segments help a marketer better understand his/her customers. But, often (and I would even say most of the time) it is next to impossible to target these segments. Back to the drug campaign example, since I know what shows various demographic groups watch, I can tell you to spend your ad dollars on males aged 16-17. But, how the heck do you then target further and find a way to reach the “risk taking” segment you really want? If you can’t target, segmentation is largely an academic exercise.

Traditionally you couldn’t target psychographic segments all that well. But, with what Google and Facebook now know about their users, you can. If we can profile enough of the Facebook teenage user base and have access to who their friends are, we can get incredibly efficient in our targeting.  Ad spend can get to those who have a much higher propensity for drug use and we can avoid wasting money on those who have low propensity.

It is a brilliant approach. But, like most things on the Internet, it can be a force for bad as well as good. If what Cambridge Analytica had done was for the benefit of an anti-drug campaign, I don’t think it would be nearly the story it has become. Once it went into a polarized political climate, it became news gold.

Even when an approach like this is applied to what most would call legitimate marketing, say for a consumer packaged good, it can get a bit creepy and feel manipulative. It is conceivable that via something one of my Facebook friends did, I can get profiled as a drinker of a specific brand of beer. Since Google also knows where my phone is, I can then be sent an ad or a coupon at the exact moment I walk by the beer case in my local grocery store. Or, my friends can be sent the same message. And I didn’t do anything to knowingly opt into being targeted like this.

There are ethical discussions that need to be had regarding whether this is good or bad, if it is a service to the consumer, or if it is too manipulative. But, this sort of targeting and meshing of research and marketing is not futuristic – all of the underpinning technology is there at the ready and it is only a matter of time until marketers really learn how to tap into it. It is a different world for sure and one that is coming fast.

Going Mobile

There has been a critical trend happening in market research data collection that is getting little attention. If you are gathering data in online surveys and polls, chances are that most of your respondents are now answering your questionnaires on mobile devices.

This trend snuck up on us. Just three years ago we were advising clients that we were noticing that about 25% of respondents were answering on mobile devices. Of the last 10 projects we have completed, that percentage is now between 75% and 80%. (Our firm conducts a lot of research with younger respondents which likely skews this higher for us than other firms, but it remains the case mobile response has become the norm.)

Survey response tools have evolved considerably. Respondents initially used either the mail or provided responses to an interviewer on the other end of a clipboard. Then, people primarily answered surveys from a tethered land-line phone. The internet revolution made it possible to move data collection to a (stationary) computer. Now, respondents are choosing to answer on a device that is always with them and when and where they choose.

There are always “mode” effects in surveys – whereby the mode itself can influence results. However, the mode effects involved in mobile data collection has not been well-studied. We will sometimes compare mobile versus non-mobile respondents on a specific project, but in our data this is not a fair comparison because there is a self-selection that occurs. Our respondents can choose to respond either on a mobile device or on a desktop/laptop. If we see differences across modes it could simply be due to the nature of the choice respondents make and have little to do with the mode itself.

To study this properly, an experimental design would be needed – where respondents are randomly assigned to a mobile or desktop mode. After searching and asking around to the major panel companies, I wasn’t able to find any such studies that have been conducted.

That is a bit crazy – our respondents are providing data in a new and interesting fashion, and our industry has done little to study how that might influence the usefulness of the information we collect.

Here is what we do know. First, questionnaires do not look the same on mobile devices as they do on laptops. Most types of questions look similar, but grid-style questions look completely different.  Typically, on a mobile device respondents will see one item at a time and on a desktop they will see the entire list. This will create a greater response-set type bias on the desktop version. I’d say that this implies that a mode effect likely does occur and that it doesn’t vary in the same way across all types of questions you are asking.

Second, the limited real estate of a mobile device makes wordy questions and responses look terrible. Depending on the survey system you are using, a lengthy question can require both horizontal and vertical scrolling, almost guaranteeing that respondents won’t attend to it.

Our own anecdotal information suggests that mobile respondents will complete a questionnaire faster, are more likely to suspend the survey part-way, and provide less rich open-ended responses.

So, how can we guard against these mode effects? Well, in the absence of research-on-research that outlines their nature, we have a few suggestions:

  • First and foremost, we need to develop a “mobile-first” mentality when designing questionnaires. Design your questionnaire for mobile and adapt it as necessary for the desktop. This is likely opposite to what you are currently doing.
  • Mobile-first means minimizing wording and avoiding large grid-type questions. If you must use grids, use fewer scale points and keep the number of items to a minimum.
  • Visuals are tough … remember that you have a 5 or 6 inch display to work with when showing images. You are limited here.
  • Don’t expect much from open-ended questions. Open-ends on mobile have to be precisely worded and not vague. We often find that clients expect too much from open-ended responses.
  • Test the questionnaire on mobile. Most researchers who are designing and testing questionnaires are looking at a desktop/laptop screen all day long, and our natural tendency is to only test on a desktop. Start your testing on mobile and then move to the desktop.
  • Shorten your questionnaires. It seems likely that respondents will have more patience for lengthy surveys when they are taking them on stationary devices as opposed to devices that are with them at all (sometimes distracting) times.
  • Finally, educate respondents not to answer these surveys when they themselves are “mobile.” With the millions of invitations and questionnaires our industry is fulfilling, we need to be sure we aren’t distracting respondents while they are driving.

In the long run, as even more respondents choose mobile this won’t be a big issue. But, if you have a tracking study in place you should wonder if the movement to mobile is affecting your data in ways you aren’t anticipating.

Will Big Data Kill Traditional Market Research?

Most of today’s research methods rely on a simple premise:  asking customers questions can yield insights that drive better decisions. This is traditionally called primary research because it involves gathering new data. It is often supplemented with secondary research, which involves looking at information that already exists, such as sales data, publicly available data, etc.  Primary and secondary research yield what I would call active data –individuals are providing data with their knowledge and consent.

We are moving to a passive data world. This involves analyzing data left behind as we live increasingly digital lives. When we breathe online we leave a trail of digital data crumbs everywhere – where we visit, what we post about, link to, the apps we use, etc. We also leave trails as to when and where we are when we do these things, what sequence we do them in, and even what those close to us do.

Our digital shadows are long. And these shadows provide an incredibly accurate version of ourselves. You may not remember what you had to eat a few days ago, but the Internet knows exactly what books you read, how fast you read them, and when you bought them. The Internet knows where you were when you looked up health information, your favorite places to travel, whether you lean liberal or conservative, and much more. Your digital shadow is alarmingly accurate.

Privacy issues aside, this creates exciting possibilities for market research.

The amount of information available is staggering.  It is estimated that the volume of digital information available is doubling about every 18 months. This means in the next year and a half we will create as much data as we have since the Internet was created. Clearly it is easy to drown in the noise of this data, and many certainly do. But, in some ways analyzing this data isn’t unlike what we have been doing for years. It is easy to drown in a data set if you don’t have clear hypotheses that you are pursuing.  Tapping into the power of Big Data is all about formulating the right questions before firing up the laptop.

So, how will Big Data change traditional, “active” research? Why would we need to ask people questions when we can track their actual behaviors more accurately?

Big Data will not obviate the need for traditional survey research. But, it will reposition it. Survey research will change and be reserved for marketing problems it is particularly well suited for.  Understanding underlying motivations of behavior will always require that we talk directly to consumers, if only to probe why their reported behavior differs from their actual behavior.

There are situations when Big Data techniques will triumph. We are noticing compelling examples of how Big Data analysis can save the world.  For instance, medical researchers are looking into diseases that are asymptomatic. Typically, an early doctor’s appointment for these diseases will consist of a patient struggling to remember symptoms and warning signs and when they might have had them.  An analysis of Google searches can look at people who can be inferred to have been diagnosed with the disease from their search behavior. Then, their previous search behavior can be analyzed to see if they were curious about symptoms and when.  In the hands of a skilled analyst, this can lead to new insights regarding the early warning signs of diseases that often are diagnosed too late.

There has been chatter that public health officials can track the early spread of the flu better each year by analyzing search trends than by using their traditional ways, which track doctor visits for the flu and prescriptions dispensed. The reason is that people Google for “flu symptoms” in advance of going to the doctor, and many who have symptoms don’t go to the doctor at all. A search trend analysis can help public health officials react faster to outbreaks.

This is all pretty cool. Marketers are all about delivering the right message to the right people at the right time, and understanding how prior online behavior predicts future decisions will be valued. Big Data is accurate in a way that surveys cannot be because memory is imperfect.

Let’s be clear. I don’t think that people lie on surveys, at least not purposefully. But there are memory errors that harm the ability of a survey to uncover the truth. For instance, I could ask on a survey what books you have read in the past month. But, sales data from the Kindle Store would probably be more accurate.

However, what proponents of “Big Data will take over the world” don’t realize is the errors that respondents make on surveys can be more valuable to marketers than the truth because their recollections are often more predictive of their future behavior than their actual past behavior. What you think you had for dinner two nights ago probably predicts what you will eat tonight better than what you actually may have eaten. Perceptions can be more important than reality and marketing is all about dealing with perceptions.

The key for skilled researchers is going to be to learn when Big Data techniques are superior and when traditional techniques will yield better insights. Big Data is a very big hammer, but isn’t suitable for every size nail.

It is an exciting time for our field. Data science and data analysis skills are going to become even more valuable in the labor market than they are today. While technical database and statistical skills will be important, in a Big Data era it will be even more important to have skills in developing the right questions to pursue in the first place and a solid understanding of the issues our clients face.

Let’s Make Research and Polling Great Again!

Crux Logo Final 2016

The day after the US Presidential election, we quickly wrote and posted about the market research industry’s failure to accurately predict the election.  Since this has been our widest-read post (by a factor of about 10!) we thought a follow-up was in order.

Some of what we predicted has come to pass. Pollsters are being defensive, claiming their polls really weren’t that far off, and are not reaching very deep to try to understand the core of why their predictions were poor. The industry has had a couple of confabs, where the major players have denied a problem exists.

We are at a watershed moment for our industry. Response rates continue to plummet, clients are losing confidence in the data we provide, and we are swimming in so much data our insights are often not able to find space to breathe. And the public has lost confidence in what we do.

Sometimes it is everyday conversations that can enlighten a problem. Recently, I was staying at an AirBnB in Florida. The host (Dan) was an ardent Trump supporter and at one point he asked me what I did for a living. When I told him I was a market researcher the conversation quickly turned to why the polls failed to accurately predict the winner of the election. By talking with Dan I quickly I realized the implications of Election 2016 polling to our industry. He felt that we can now safely ignore all polls – on issues, approval ratings, voter preferences, etc.

I found myself getting defensive. After all, the polls weren’t off that much.  In fact, they were actually off by more in 2012 than in 2016 – the problem being that this time the polling errors resulted in an incorrect prediction. Surely we can still trust polls to give a good sense of what our citizenry thinks about the issues of the day, right?

Not according to Dan. He didn’t feel our political leaders should pay attention to the polls at all because they can’t be trusted.

I’ve even seen a new term for this bandied about:  poll denialism. It is a refusal to believe any poll results because of their past failures. Just the fact that this has been named should be scary enough for researchers.

This is unnerving not just to the market research industry, but to our democracy in general.  It is rarely stated overtly, but poll results are a key way political leaders keep in touch with the needs of the public, and they shape public policy a lot more than many think. Ignoring them is ignoring public opinion.

Market research remains closely associated with political polling. While I don’t think clients have become as mistrustful about their market research as the public has become about polling, clients likely have their doubts. Much of what we do as market researchers is much more complicated than election polling. If we can’t successfully predict who will be President, why would a client believe our market forecasts?

We are at a defining moment for our industry – a time when clients and suppliers will realize this is an industry that has gone adrift and needs a righting of the course. So what can we do to make research great again?  We have a few ideas.

  1. First and foremost, if you are a client, make greater demands for data quality. Nothing will stimulate the research industry more to fix itself than market forces – if clients stop paying for low quality data and information, suppliers will react.
  2. Slow down! There is a famous saying about all projects.  They have three elements that clients want:  a) fast, b) good, and c) cheap, and on any project you can choose two of these.  In my nearly three decades in this industry I have seen this dynamic change considerably. These days, “fast” is almost always trumping the other two factors.  “Good” has been pushed aside.  “Cheap” has always been important, but to be honest budget considerations don’t seem to be the main issue (MR spending continues to grow slowly). Clients are insisting that studies are conducted at a breakneck pace and data quality is suffering badly.
  3. Insist that suppliers defend their methodologies. I’ve worked for corporate clients, but also many academic researchers. I have found that a key difference between them becomes apparent during results presentations. Corporate clients are impatient and want us to go as quickly as possible over the methodology section and get right into the results.  Academics are the opposite. They dwell on the methodology and I have noticed if you can get an academic comfortable with your methods it is rare that they will doubt your findings. Corporate researchers need to understand the importance of a sound methodology and care more about it.
  4. Be honest about the limitations of your methodology. We often like to say that everything you were ever taught about statistics assumed a random sample and we haven’t seen a study in at least 20 years that can credibly claim to have one.  That doesn’t mean a study without a random sample isn’t valuable, it just means that we have to think through the biases and errors it could contain and how that can be relevant to the results we present. I think every research report should have a page after the methodology summary that lists off the study’s limitations and potential implications to the conclusions we draw.
  5. Stop treating respondents so poorly. I believe this is a direct consequence of the movement from telephone to online data collection. Back in the heyday of telephone research, if you fielded a survey that was too long or was challenging for respondents to answer, it wasn’t long until you heard from your interviewers just how bad your questionnaire was. In an online world, this feedback never gets back to the questionnaire author – and we subsequently beat up our respondents pretty badly.  I have been involved in at least 2,000 studies and about 1 million respondents.  If each study averages 15 minutes that implies that people have spent about 28 and a half years filling out my surveys.  It is easy to lose respect for that – but let’s not forget the tremendous amount of time people spend on our surveys. In the end, this is a large threat to the research industry, as if people won’t respond, we have nothing to sell.
  6. Stop using technology for technology’s sake. Technology has greatly changed our business. But, it doesn’t supplant the basics of what we do or allow us to ignore the laws of statistics.  We still need to reach a representative sample of people, ask them intelligent questions, and interpret what it means for our clients.  Tech has made this much easier and much harder at the same time.  We often seem to do things because we can and not because we should.

The ultimate way to combat “poll denialism” in a “post-truth” world is to do better work, make better predictions, and deliver insightful interpretations. That is what we all strive to do, and it is more important than ever.

 

An Epic Fail: How Can Pollsters Get It So Wrong?

picture1

Perhaps the only bigger loser than Hillary Clinton in yesterday’s election was the polling industry itself. Those of us who conduct surveys for a living should be asking if we can’t even get something as simple as a Presidential election right, why should our clients have confidence in any data we provide?

First, a recap of how poorly the polls and pundits performed:

  • FiveThirtyEight’s model had Clinton’s likelihood of winning at 72%.
  • Betfair (a prediction market) had Clinton trading at an 83% chance of winning.
  • A quick scan of Real Clear Politics on Monday night showed 25 final national polls. 22 of these 25 polls had Clinton as the winner, and the most reputable ones almost all had her winning the popular vote by 3 to 5 points. (It should be noted that Clinton seems likely to win the popular vote.)

There will be claims that FiveThirtyEight “didn’t say her chances were 100%” or that Betfair had Trump with a “17% chance of winning.” Their predictions were never to be construed to be certain.  No prediction is ever 100% certain, but this is a case where almost all forecasters got it wrong.  That is pretty close to the definition of a bias – something systematic that affected all predictions must have happened.

The polls will claim that the outcome was in the margin of error. But, to claim a “margin of error” defense is statistically suspect, as margins of error only apply to random or probability samples and none of these polls can claim to have a random sample. FiveThirtyEight also had Clinton with 302 electoral votes, way beyond any reasonable error rate.

Regardless, the end result is going to end up barely within the margin of error of most of these polls erroneously use anyway. That is not a free pass for the pollsters at all. All it means is rather than their estimate being accurate 95% of the time, it was predicted to be accurate a little bit less:  between 80% and 90% of the time for most of these polls by my calculations.

Lightning can strike for sure. But this is a case of it hitting the same tree numerous times.

So, what happened? I am sure this will be the subject of many post mortems by the media and conferences from the research industry itself, but let me provide an initial perspective.

First, it seems that it had anything to do with the questions themselves. In reality, most pollsters use very similar questions to gather voter preferences and many of these questions have been in use for a long time.  Asking whom you will vote for is pretty simple. The question itself seems to be an unlikely culprit.

I think the mistakes the pollster’s made come down to some fairly basic things.

  1. Non-response bias. This has to be a major reason why the polls were wrong. In short, non-response bias means that the sample of people who took the time to answer the poll did not adequately represent the people who actually voted.  Clearly this must have occurred. There are many reasons this could happen.  Poor response rates is likely a key one, but poor selection of sampling frames, researchers getting too aggressive with weighting and balancing, and simply not being able to reach some key types of voters well all play into it.
  2. Social desirability bias. This tends to be more present in telephone and in-person polls that involve an interviewer but it happens in online polls as well. This is when the respondent tells you what you want to hear or what he or she thinks is socially acceptable. A good example of this is if you conduct a telephone poll and an online poll at the same time, more people will say they believe in God in the telephone poll.  People tend to answer how they think they are supposed to, especially when responding to an interviewer.   In this case, let’s take the response bias away.  Suppose pollsters reached every single voter who actually showed up in a poll. If we presume “Trump” was a socially unacceptable answer in the poll, he would do better in the actual election than in the poll.  There is evidence this could have happened, as polls with live interviewers had a wider Clinton to Trump gap than those that were self-administered.
  3. Third parties. It looks like Gary Johnson’s support is going to end up being about half of what the pollster’s predicted.  If this erosion benefited Trump, it could very well have made a difference. Those that switched their vote from Johnson in the last few weeks might have been more likely to switch to Trump than Clinton.
  4. Herding. This season had more polls than ever before and they often had widely divergent results.  But, if you look closely you will see that as the election neared, polling results started to converge.  The reason could be that if a pollster had a poll that looked like an outlier, they probably took a closer look at it, toyed with how the sample was weighted, or decided to bury the poll altogether.  It is possible that there were some accurate polls out there that declared a Trump victory, but the pollster’s didn’t release them.

I’d also submit that the reasons for the polling failure are likely not completely specific to the US and this election. We can’t forget that pollsters also missed the recent Brexit vote, the Mexican Presidency, and David Cameron’s original election in the UK.

So, what should the pollsters do? Well, they owe it to the industry to convene, share data, and attempt to figure it out. That will certainly be done via the trade organizations pollsters belong to, but I have been to a few of these events and they devolve pretty quickly into posturing, defensiveness, and salesmanship. Academics will take a look, but they move so slowly that the implications they draw will likely be outdated by the time they are published.  This doesn’t seem to be an industry that is poised to fix itself.

At minimum, I’d like to see the polling organizations re-contact all respondents from their final polls. That would shed a lot of light on any issues relating to social desirability or other subtle biases.

This is not the first time pollsters have gotten it wrong. President Hillary Clinton will be remembered in history along with President Thomas Dewey and President Alf Landon.  But, this time seems different.  There is so much information out there that seeing the signal to the noise is just plain difficult – and there are lessons in that for Big Data analyses and research departments everywhere.

We are left with an election result that half the country is ecstatic about and half is worried about.  However, everyone in the research industry should be deeply concerned. I am hopeful that this will cause more market research clients to ask questions about data quality, potential errors and biases, and that they will value quality more. Those conversations will go a long way to putting a great industry back on the right path.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.