Posts Tagged 'statistics'

What is p-hacking, and why do most researchers do it?

What sets good researchers apart is their ability to find a compelling story in a data set. It is what we do – we review various data points, combine that with our knowledge of a client’s business, and craft a story that leads to market insight.

Unfortunately, researchers can be too good at this. We have a running joke in our firm that we could probably hand a random data set to an analyst, and they could come up with a story that was every bit as convincing as the story they would develop from actual data.

Market researchers need to be wary of something well-known among academic researchers: a phenomenon known as “p-hacking.” It is a tendency to run and re-run analyses until we discover a statistically significant result.

A “p-value” is one of the most important statistics in research. It can be tricky to define precisely — it is the probability that your effect (research result) is due to chance and not the difference between your test and control. It is the chance that your hypothesis will be falsely rejected. We say the result is statistically significant when a p-value is less than 5%. We mean there is less than 5% we got this result by chance.

Researchers widely use p-values to determine if a result is worth mentioning. In academia, most papers will not be published in a peer-reviewed journal if their p-value is not below 5%. Most quant analysts will not highlight a finding in market research if the p-value isn’t under 5%.

P-hacking is what happens when the initial analysis doesn’t hit this threshold. Researchers will do things such as:

  • Change the variable. Our result doesn’t hit the threshold, so we search for a new measure where it does.
  • Redefine our variables. Using the full range of the response didn’t work, so we look at the top box, the top 2 boxes, the mean, etc., until the result we want pans out.
  • Change the population. It didn’t work with all respondents, but is there something among a subgroup, such as males, young respondents, or customers?
  • Run a table that does statistical testing of all subgroups compared to each other. (Guaranteeing that one in 20 of these significant findings will be due to chance.)
  • Relax the threshold. The findings didn’t work at 5%, so we go ahead and report them anyway and say they are “directional.”’

These tactics are all inappropriate and common. If you are a market researcher and reading this, I’d be surprised if you haven’t done all of these at some point in your career. I have done them all.

P-hacking happens for understandable reasons. Other information outside the study points towards a result we should be getting. Our clients pressure us to do it. And, with today’s sample sizes being so large, p-hacking is easy to do. Give me a random data set with 2,000 respondents, and I will guarantee that I can find statistically significant results and create a story around them that will wow your marketing team.

I learned about p-hacking the hard way. Early in my career, I gathered an extensive data set for a college professor who was well-known and well-published within his field. He asked me to run some statistical analyses for him. When the ones he specified didn’t pan out, I started running the data on subgroups, changing how some variables were defined, etc., until I could present him with significant statistical output.

Fortunately, rather than chastise me, he went into teaching mode. He told me that just fishing around in the data set until you find something that works statistically is not how data analysis should be done. With a big data set and enough hooks in the water, you will always find some insight ready to bite.

Instead, he taught me that you always start with a hypothesis. If that hypothesis doesn’t pan out, first recognize that there is some learning in that. And it is okay to use that learning to adjust your hypothesis and test again, but your analysis has to be driven by the theory instead of the theory being driven by the data.

Good analysis is not about tinkering with data through trial and error. Too many researchers do this until something works. They fail to report on the many unproductive rabbit holes they dug. But, by definition, you’d randomly get a statistically significant result about one time in 20.

This sounds obscure, but I would say that it is the most common mistake I see marketing analysts make. Clients will press us to redefine variables to make a regression work better. We’ll use “top box” measures rather than the full variable range, with no real reason except that it makes our models fit. We relax the level of statistical significance. We p-hack.

In general, market researchers “fish in the data” a lot. I sometimes wonder how many lousy marketing decisions have been made over time due to p-hacking.

I used to sit next to an incredible statistician. As good a data analyst as he was, he was one of the worst questionnaire writers I have ever met. He didn’t seem to care too much, as he felt he could wrangle almost any data into submission with his talent. He was a world-class p-hacker.

I was the opposite. I’ve never been a great statistician. So, I’ve learned to compensate by developing design talent, as I quickly noticed that a well-written questionnaire makes data analysis easy and often obviates the need for complex statistics. I learned over time that a good questionnaire is an antidote to p-hacking. 

Start with hypotheses and think about alternative hypotheses when you design the project. And develop these before you even compose a questionnaire. Never believe that the story will magically appear in your data – instead, start with a range of potential stories and then, in your design, allow for data to support or refute each of them. Be balanced in how you go about it, but be directed as well.

It is vital to push for the time upfront to accomplish this, as the collapsed time frames for today’s projects are a key cause of p-hacking.

Of course, nobody wants to conduct a project and be unable to conclude anything. If that happens, you likely went wrong at the project’s design stage – you didn’t lay out objectives and potential hypotheses well. Resist the tendency to p-hack, be mindful of this issue, and design your studies well so you won’t be tempted to do it.

Pre-Election Polling and Baseball Share a Lot in Common

The goal of a pre-election poll is to predict which candidate will win an election and by how much. Pollsters work towards this goal by 1) obtaining a representative sample of respondents, 2) determining which candidate a respondent will vote for, and 3) predicting the chances each respondent will take the time to vote.

All three of these steps involve error. It is the first one, obtaining a representative sample of respondents, which has changed the most in the past decade or so.

It is the third characteristic that separates pre-election polling from other forms of polling and survey research. Statisticians must predict how likely each person they interview will be to vote. This is called their “Likely Voter Model.”

As I state in POLL-ARIZED, this is perhaps the most subjective part of the polling process. The biggest irony in polling is that it becomes an art when we hand the data to the scientists (methodologists) to apply a Likely Voter Model.

It is challenging to understand what pollsters do in their Likely Voter Models and perhaps even more challenging to explain.  

An example from baseball might provide a sense of what pollsters are trying to do with these models.

Suppose Mike Trout (arguably the most underappreciated sports megastar in history) is stepping up to the plate. Your job is to predict Trout’s chances of getting a hit. What is your best guess?

You could take a random guess between 0 and 100%. But, since that would give you a 1% chance of being correct, there must be a better way.

A helpful approach comes from a subset of statistical theory called Bayesian statistics. This theory says we can start with a baseline of Trout’s hit probability based on past data.

For instance, we might see that so far this year, the overall major league batting average is .242. So, we might guess that Trout’s probability of getting a hit is 24%.

This is better than a random guess. But, we can do better, as Mike Trout is no ordinary hitter.

We might notice there is even better information out there. Year-to-date, Trout is batting .291. So, our guess for his chances might be 29%. Even better.

Or, we might see that Trout’s lifetime average is .301 and that he hit .333 last year. Since we believe in a concept called regression to the mean, that would lead us to think that his batting average should be better for the rest of the season than it is currently. So, we revise our estimate upward to 31%.

There is still more information we can use. The opposing pitcher is Justin Verlander. Verlander is a rare pitcher who has owned Trout in the past – Trout’s average is just .116 against Verlander. This causes us to revise our estimate downward a bit. Perhaps we take it to about 25%.

We can find even more information. The bases are loaded. Trout is a clutch hitter, and his career average with men on base is about 10 points higher than when the bases are empty. So, we move our estimate back up to about 28%.

But it is August. Trout has a history of batting well early in and late in the season, but he tends to cool off during the dog days of summer. So, we decide to end this and settle on a probability of 25%.

This sort of analysis could go on forever. Every bit of information we gather about Trout can conceivably help make a better prediction for his chances. Is it raining? What is the score? What did he have for breakfast? Is he in his home ballpark? Did he shave this morning? How has Verlander pitched so far in this game? What is his pitch count?

There are pre-election polling analogies in this baseball example, particularly if you follow the probabilistic election models created by organizations like FiveThirtyEight and The Economist.

Just as we might use Trout’s lifetime average as our “prior” probability, these models will start with macro variables for their election predictions. They will look at the past implications of things like incumbency, approval ratings, past turnout, and economic indicators like inflation, unemployment, etc. In theory, these can adjust our assumptions of who will win the election before we even include polling data.

Of course, using Trout’s lifetime average or these macro variables in polling will only be helpful to the extent that the future behaves like the past. And therein lies the rub – overreliance on past experience makes these models inaccurate during dynamic times.

Part of why pollsters missed badly in 2020 is unique things were going on – a global pandemic, changed methods of voting, increased turnout, etc. In baseball, perhaps this is a year with a juiced baseball, or Trout is dealing with an injury.

The point is that while unprecedented things are unpredictable, they happen with predictable regularity. There is always something unique about an election cycle or a Mike Trout at bat.

The most common question I am getting from readers of POLL-ARIZED is, “will the pollsters get it right in 2024?” My answer is that since pollsters are applying past assumptions in their model, they will get it right to the extent that the world in 2024 looks like the world did in 2020, and I would not put my own money on it.

I make a point in POLL-ARIZED that pollsters’ models have become too complex. While in theory, the predictive value of a model never gets worse when you add in more variables, in practice, this has made these models uninterpretable. Pollsters include so many variables in their likely voter models that many of their adjustments cancel each other out. They are left with a model with no discernable underlying theory.

If you look closely, we started with a probability of 24% for Trout. Even after looking at a lot of other information and making reasonable adjustments, we still ended up with a prediction of 25%. The election models are the same way. They include so many variables that they can cancel out each other’s effects and end up with a prediction that looks much like the raw data did before the methodologists applied their wizardry.

This effort is better spent at getting better input for the models by investing in generating the trust needed to increase the response rates we get to our surveys and polls. Improving the quality of our data input will increase the predictive quality of the polls more than coming up with more complicated ways to weight the data.

Of course, in the end, one candidate wins, and the other loses, and Mike Trout either gets a hit, or he doesn’t, so the actual probability moves to 0% or 100%. Trout cannot get 25% of a hit, and a candidate cannot win 79% of an election.

As I write this, I looked up the last time Trout faced Verlander. It turns out Verlander struck him out!

POLL-ARIZED available on May 10

I’m excited to announce that my book, POLL-ARIZED, will be available on May 10.
 
After the last two presidential elections, I was fearful my clients would ask a question I didn’t know how to answer: “If pollsters can’t predict something as simple as an election, why should I believe my market research surveys are accurate?”
 
POLL-ARIZED results from a year-long rabbit hole that question led me down! In the process, I learned a lot about why polls matter, how today’s pollsters are struggling, and what the insights industry should do to improve data quality.
 
I am looking for a few more people to read an advance copy of the book and write an Amazon review on May 10. If you are interested, please send me a message at poll-arized@cruxresearch.com.

Let’s Appreciate Statisticians Who Make Data Understandable

Statistical analyses are amazing, underrated tools. All scientific fields depend on discoveries in statistics to make inferences and draw conclusions. Without statistics, advances in engineering, medicine, and science that have greatly improved the quality of life would not have been possible. Statistics is the Rodney Dangerfield of academic subjects – it never gets the respect it deserves.

Statistics is central to market research and polling. We use statistics to describe our findings and understand the relationships between variables in our data sets. Statistics are the most important tools we have as researchers.

However, we often misuse these tools. I firmly believe that pollsters and market researchers overdo it with statistics. Basic, statistical analyses are easy to understand, but complicated ones are not. Researchers like to get into complex statistics because it lends an air of expertise to what we do.

Unfortunately, most sophisticated techniques are impossible to convey to “normal” people who may not have a statistical background, and this tends to describe the decision-makers we support.

I learned long ago that when working with a dataset, any result that will be meaningful will likely be uncovered by using simple descriptive statistics and cross-tabulations. Multivariate techniques can tease out more subtle relationships in the data. Still, the clients (primarily marketers) we work with are not looking for subtleties – they want some conclusions that leap off the page from the data.

If a result is so subtle that it needs complicated statistics to find, it is likely not a large enough result to be acted upon by a client.

Because of this, we tend to use multivariate techniques to confirm what we see with more straightforward methods. Not always – as there are certainly times when the client objectives call for sophisticated techniques. But, as researchers, our default should be to use the most straightforward designs possible.

I always admire researchers who make complicated things understandable. That should be the goal of statistical analyses. George Terhanian of Electric Insights has developed a way to use sophisticated statistical techniques to answer some of the most fundamental questions a marketer will ask.

In his article “Hit? Stand? Double? Master’ likely effects’ to make the right call”, George describes his revolutionary process. It is sophisticated behind the scenes, but I like the simplicity in the questions it can address.

He has created a simulation technique that makes sense of complicated data sets. You may measure hundreds of things on a survey and have an excellent profile of the attitudes and behaviors of your customer base. But, where should you focus your investments? This technique demonstrates the likely effects of changes.

As marketers, we cannot directly increase sales. But we can establish and influence attitudes and behaviors that result in sales. Our problem is often to identify which of these attitudes and behaviors to address.

For instance, if I can convince my customer base that my product is environmentally responsible, how many of them can I count on to buy more of my product? The type of simulator described in this article can answer this question, and as a marketer, I can then weigh if the investment necessary is worth the probable payoff.

George created a simulator on some data from a recent Crux Poll. Our poll showed that 17% of Americans trust pollsters. George’s analysis shows that trust in pollsters is directly related to their performance in predicting elections.

Modeling the Crux Poll data showed that if all Americans “strongly agreed” that presidential election polls do a good job of predicting who will win, trust in pollsters/polling organizations would increase by 44 million adults. If Americans feel “extremely confident” that pollsters will accurately predict the 2024 election, trust in pollsters will increase by an additional 40 million adults.

If we are worried that pollsters are untrusted, this suggests that improving the quality of our predictions should address the issue.

Putting research findings in these sorts of terms is what gets our clients’ attention. 

Marketers need this type of quantification because it can plug right into financial plans. Researchers often hear that the reports we provide are not “actionable” enough. There is not much more actionable than showing how many customers would be expected to change their behavior if we successfully invest in a marketing campaign to change an attitude.

Successful marketing is all about putting the probabilities in your favor. Nothing is certain, but as a marketer, your job is to decide where best place your resources (money and time). This type of modeling is a step in the right direction for market researchers.

The myth of the random sample

Sampling is at the heart of market research. We ask a few people questions and then assume everyone else would have answered the same way.

Sampling works in all types of contexts. Your doctor doesn’t need to test all of your blood to determine your cholesterol level – a few ounces will do. Chefs taste a spoonful of their creations and then assume the rest of the pot will taste the same. And, we can predict an election by interviewing a fairly small number of people.

The mathematical procedures that are applied to samples that enable us to project to a broader population all assume that we have a random sample. Or, as I tell research analysts: everything they taught you in statistics assumes you have a random sample. T-tests, hypotheses tests, regressions, etc. all have a random sample as a requirement.

Here is the problem: We almost never have a random sample in market research studies. I say “almost” because I suppose it is possible to do, but over 30 years and 3,500 projects I don’t think I have been involved in even one project that can honestly claim a random sample. A random sample is sort of a Holy Grail of market research.

A random sample might be possible if you have a captive audience. You can random sample some the passengers on a flight or a few students in a classroom or prisoners in a detention facility. As long as you are not trying to project beyond that flight or that classroom or that jail, the math behind random sampling will apply.

Here is the bigger problem: Most researchers don’t recognize this, disclose this, or think through how to deal with it. Even worse, many purport that their samples are indeed random, when they are not.

For a bit of research history, once the market research industry really got going the telephone random digit dial (RDD) sample became standard. Telephone researchers could randomly call land line phones. When land line telephone penetration and response rates were both high, this provided excellent data. However, RDD still wasn’t providing a true random, or probability sample. Some households had more than one phone line (and few researchers corrected for this), many people lived in group situations (colleges, medical facilities) where they couldn’t be reached, some did not have a land line, and even at its peak, telephone response rates were only about 70%. Not bad. But, also, not random.

Once the Internet came of age, researchers were presented with new sampling opportunities and challenges. Telephone response rates plummeted (to 5-10%) making telephone research prohibitively expensive and of poor quality. Online, there was no national directory of email addresses or cell phone numbers and there were legal prohibitions against spamming, so researchers had to find new ways to contact people for surveys.

Initially, and this is still a dominant method today, research firms created opt-in panels of respondents. Potential research participants were asked to join a panel, filled out an extensive demographic survey, and were paid small incentives to take part in projects. These panels suffer from three response issues: 1) not everyone is online or online at the same frequency, 2) not everyone who is online wants to be in a panel, and 3) not everyone in the panel will take part in a study. The result is a convenience sample. Good researchers figured out sophisticated ways to handle the sampling challenges that result from panel-based samples, and they work well for most studies. But, in no way are they a random sample.

River sampling is a term often used to describe respondents who are “intercepted” on the Internet and asked to fill out a survey. Potential respondents are invited via online ads and offers placed on a range of websites. If interested, they are typically pre-screened and sent along to the online questionnaire.

Because so much is known about what people are doing online these days, sampling firms have some excellent science behind how they obtain respondents efficiently with river sampling. It can work well, but response rates are low and the nature of the online world is changing fast, so it is hard to get a consistent river sample over time. Nobody being honest would ever use the term “random sampling” when describing river samples.

Panel-based samples and river samples represent how the lion’s share of primary market research is being conducted today. They are fast and inexpensive and when conducted intelligently can approximate the findings of a random sample. They are far from perfect, but I like that the companies providing them don’t promote them as being random samples. They involve some biases and we deal with these biases as best we can methodologically. But, too often we forget that they violate a key assumption that the statistical tests we run require: that the sample is random. For most studies, they are truly “close enough,” but the problem is we usually fail to state the obvious – that we are using statistical tests that are technically not appropriate for the data sets we have gathered.

Which brings us to a newer, shiny object in the research sampling world: ABS samples. ABS (addressed-based samples) are purer from a methodological standpoint. While ABS samples have been around for quite some time, they are just now being used extensively in market research.

ABS samples are based on US Postal Service lists. Because USPS has a list of all US households, this list is an excellent sampling frame. (The Census Bureau also has an excellent list, but it is not available for researchers to use.) The USPS list is the starting point for ABS samples.

Research firms will take the USPS list and recruit respondents from it, either to be in a panel or to take part in an individual study. This recruitment can be done by mail, phone, or even online. They often append publicly-known information onto the list.

As you might expect, an ABS approach suffers from some of the same issues as other approaches. Cooperation rates are low and incentives (sometimes large) are necessary. Most surveys are conducted online, and not everyone in the USPS list is online or has the same level of online access. There are some groups (undocumented immigrants, homeless) that may not be in the USPS list at all. Some (RVers, college students, frequent travelers) are hard to reach. There is evidence that ABS approaches do not cover rural areas as well as urban areas. Some households use post office boxes and not residential addresses for their mail. Some use more than one address. So, although ABS lists cover about 97% of US households, the 3% that they do not cover are not randomly distributed.

The good news is, if done correctly, the biases that result from an ABS sample are more “correctable” than those from other types of samples because they are measurable.

A recent Pew study indicates that survey bias and the number of bogus respondents is a bit smaller for ABS samples than opt-in panel samples.

But ABS samples are not random samples either. I have seen articles that suggest that of all those approached to take part in a study based on an ABS sample, less than 10% end up in the survey data set.

The problem is not necessarily with ABS samples, as most researchers would concur that they are the best option we have and come the closest to a random sample. The problem is that many firms that are providing ABS samples are selling them as “random samples” and that is disingenuous at best. Just because the sampling frame used to recruit a survey panel can claim to be “random” does not imply that the respondents you end up in a research database constitute a random sample.

Does this matter? In many ways, it likely does not. There are biases and errors in all market research surveys. These biases and errors vary not just by how the study was sampled, but also by the topic of the question, its tone, the length of the survey, etc. Many times, survey errors are not the same throughout an individual survey. Biases in surveys tend to be “unknown knowns” – we know they are there, but aren’t sure what they are.

There are many potential sources of errors in survey research. I am always reminded of a quote from Humphrey Taylor, the past Chairman of the Harris Poll who said “On almost every occasion when we release a new survey, someone in the media will ask, “What is the margin of error for this survey?” There is only one honest and accurate answer to this question — which I sometimes use to the great confusion of my audience — and that is, “The possible margin of error is infinite.”  A few years ago, I wrote a post on biases and errors in research, and I was able to quickly name 15 of them before I even had to do an Internet search to learn more about them.

The reality is, the improvement in bias that is achieved by an ABS sample over a panel-based sample is small and likely inconsequential when considered next to the other sources of error that can creep into a research project. Because of this, and the fact that ABS sampling is really expensive, we tend to only recommend ABS panels in two cases: 1) if the study will result in academic publication, as academics are more accepting of data that comes from and ABS approach, and 2) if we are working in a small geography, where panel-based samples are not feasible.

Again, ABS samples are likely the best samples we have at this moment. But firms that provide them are often inappropriately portraying them as yielding random samples. For most projects, the small improvements in bias they provide is not worth the considerable increased budget and increased study time frame, which is why, for the moment, ABS samples are currently used in a small proportion of research studies. I consider ABS to be “state of the art” with the emphasis on “art” as sampling is often less of a science than people think.

Should we get rid of statistical significance?

There has been recent debate among academics and statisticians surrounding the concept of statistical significance. Some high-profile medical studies have just narrowly missed meeting the traditional statistical significance cutoff of 0.05. This has resulted in potentially life changing drugs not being approved by regulators or pursued for further development by pharma companies. These cases have led to a much-needed review and re-education as to what statistical significance means and how it should be applied.

In a 2014 blog post (Is This Study Significant?) we discussed common misunderstandings market researchers have regarding statistical significance. The recent debate suggests this misunderstanding isn’t limited to market researchers – it appears that academics and regulators have the same difficulty.

Statistical significance is a simple concept. However, it seems that the human brain just isn’t wired well to understand probability and that lies at the root of the problem.

A measure is typically classified as statistically significant if its p-value is 0.05 or less. This means that there is a less than 5% probability that the result came from chance or random fluctuation. Two measures are deemed to be statistically different if there is a 19 out of 20 chance or greater that they are.

There are real problems with this approach. Foremost, it is unclear how this 5% probability cutoff was chosen. Somewhere along the line it became a standard among academics. This standard could have just as easily been 4% or 6% or some other number. This cutoff was chosen subjectively.

What are the chances that this 5% cutoff is optimal for all studies, regardless of the situation?

Regulators should look beyond statistical significance when they are reviewing a new medication. Let’s say a study was only significant at 6%, not quite meeting the 5% standard. That shouldn’t automatically disqualify a promising medication from consideration. Instead, regulators should look at the situation more holistically. What will the drug do? What are its side effects? How much pain does it alleviate? What is the risk of making mistakes in approval: in approving a drug that doesn’t work or in failing to approve a drug that does work? We could argue that the level of significance required in the study should depend on the answers to these questions and shouldn’t be the same in all cases.

The same is true in market research. Suppose you are researching a new product and the study is only significant at 10% and not the 5% that is standard. Whether you should greenlight the product for development depends on considerations beyond statistical significance. What is the market potential of the product? What is the cost of its development? What is the risk of failing to greenlight a winning idea or greenlighting a bad idea? Currently, too many product managers rely too much on a research project to give them answers when the study is just one of many inputs into these decisions.

There is another reason to rethink the concept of statistical significance in market research projects. Statistical significance assumes a random or a probability sample. We can’t stress this enough – there hasn’t been a market research study conducted in at least 20 years that can credibly claim to have used a true probability sample of respondents. Some (most notably ABS samples) make a valiant attempt to do so but they still violate the very basis for statistical significance.

Given that, why do research suppliers (Crux Research included) continue to do statistical testing on projects? Well, one reason is clients have come to expect it. A more important reason is that statistical significance holds some meaning. On almost every study we need to draw a line and say that two data poworints are “different enough” to point out to clients and to draw conclusions from. Statistical significance is a useful tool for this. It just should no longer be viewed as a tool where we can say precise things like “these two data points have a 95% chance of actually being different”.

We’d rather use a probability approach and report to clients the chance that two data points would be different if we had been lucky enough to use a random sample. That is a much more useful way to look at data, but it probably won’t be used much until colleges start teaching it and a new generation of researchers emerges.

The current debate over the usefulness of statistical significance is a healthy one to have. Hopefully, it will cause researchers of all types to think deeper about how precise a study needs to be and we’ll move away from the current one-size-fits-all thinking that has been pervasive for decades.

How Did Pollsters Do in the Midterm Elections?

Our most read blog post was posted the morning after the 2016 Presidential election. It is a post we are proud of because it was composed in the haze of a shocking election result. While many were celebrating their side’s victory or in shock over their side’s losses, we mused about what the election result meant for the market research industry.

We predicted pollsters would become defensive and try to convince everyone that the polls really weren’t all that bad. In fact, the 2016 polls really weren’t. Predictions of the popular vote tended to be within a percent and a half or so of the actual result which was better than for the previous Presidential election in 2012. However, the concern we had about the 2016 polls wasn’t related to how close they were to the result. The issue we had was one of bias: 22 of the 25 final polls we found made an inaccurate prediction and almost every poll was off in the same direction. That is the very definition of bias in market research.

Suppose that you had 25 people flip a coin 100 times. On average, you’d expect 50% of the flips to be “heads.” But, if say, 48% of them were “heads” you shouldn’t be all that worried as that can happen. But, if 22 of the 25 people all had less than 50% heads you should worry that there was something wrong with the coins or they way they were flipped. That is, in essence, what happened in the 2016 election with the polls.

Anyway, this post is being composed the aftermath of the 2018 midterm elections. How did the pollsters do this time?

Let’s start with FiveThirtyEight.com. We like this site because they place probabilities around their predictions. Of course, this gives them plausible deniability when their prediction is incorrect, as probabilities are never 0% or 100%. (In 2016 they gave Donald Trump a 17% chance of winning and then defended their prediction.) But this organization looks at statistics in the right way.

Below is their final forecast and the actual result. Some results are still pending, but at this moment, this is how it shapes up.

  • Prediction: Republicans having 52 seats in the Senate. Result: It looks like Republicans will have 53 seats.
  • Prediction: Democrats holding 234 and Republicans holding 231 House seats. Result: It looks like Democrats will have 235 or 236 seats.
  • Prediction: Republicans holding 26 and Democrats holding 24 Governorships. Result: Republicans now hold 26 and Democrats hold 24 Governorships.

It looks like FiveThirtyEight.com nailed this one. We also reviewed a prediction market and state-level polls, and it seems that this time around the polls did a much better job in terms of making accurate predictions. (We must say that on election night, FiveThirtyEight’s predictions were all over the place when they were reporting in real time. But, as results settled, their pre-election forecast looked very good.)

So, why did polls seem to do so much better in 2018 than 2016? One reason is the errors cancel out when you look at large numbers of races. Sure, the polls predicted Democrats would have 234 seats, and that is roughly what they achieved. But, in how many of the 435 races did the polls make the right prediction? That is the relevant question, as it could be the case that the polls made a lot of bad predictions that compensated for each other in the total.

That is a challenging analysis to do because some races had a lot of polling, others did not, and some polls are more credible than others. A cursory look at the polls suggests that 2018 was a comeback victory for the pollsters. We did sense a bit of an over-prediction favoring the Republican Senatorial candidates, but on the House side there does not seem to be a clear bias.

So, what did the pollsters do differently? Not much really. Online sampling continues to evolve and get better, and the 2016 result has caused polling firms to concentrate more carefully on their sampling. One of the issues that may have caused the 2016 problem is that pollsters are starting to almost exclusively use the top 2 or 3 panel companies. Since 2016, there has been a consolidation among sample suppliers, and as a result, we are seeing less variance in polls as pollsters are largely all using the same sample sources. The same few companies provide virtually all the sample used by pollsters.

Another key difference was that turnout in the midterms was historically high. Polls are more accurate in high turnout races, as polls almost always survey many people who do not end up showing up on election day, particularly young people. However, there are large and growing demographic differences (age, gender, race/ethnicity) in supporters of each party, and that greatly complicates polling accuracy. Some demographic subgroups are far more likely than others to take part in a poll.

Pollsters are starting to get online polling right. A lot of the legacy firms in this space are still entrenched in the telephone polling world, have been protective of their aging methodologies, and have been slow to change. After nearly 20 years of online polling the upstarts have finally forced the bigger polling firms to question their approaches and to move to a world where telephone polling just doesn’t make a lot of sense. Also, many of the old guard, telephone polling experts are now retired or have passed on, and they have largely led the resistance to online polling.

Gerrymandering helps the pollster as well. It still remains the case that relatively few districts are competitive. Pew suggests that only 1 in 7 districts was competitive. You don’t have to be a pollster to accurately predict how about 85% of the races will turn out. Only about 65 of the 435 house races were truly at stake. If you just flipped a coin in those races, in total your prediction of house seats would have been fairly close.

Of course, pollsters may have just gotten lucky. We view that as unlikely though, as there were too many races. Unlike in 2018 though, in 2016 we haven’t seen any evidence of bias (in a statistical sense) in the direction of polling errors.

So, this is a good comeback success for the polling industry and should give us greater confidence for 2020. It is important that the research industry broadcasts this success. When pollsters have a bad day, like they did in 2016, it affects market research as well. Our clients lose confidence in our ability to provide accurate information. When the pollsters get it right, it helps the research industry as well.

Will Big Data Kill Traditional Market Research?

Most of today’s research methods rely on a simple premise:  asking customers questions can yield insights that drive better decisions. This is traditionally called primary research because it involves gathering new data. It is often supplemented with secondary research, which involves looking at information that already exists, such as sales data, publicly available data, etc.  Primary and secondary research yield what I would call active data –individuals are providing data with their knowledge and consent.

We are moving to a passive data world. This involves analyzing data left behind as we live increasingly digital lives. When we breathe online we leave a trail of digital data crumbs everywhere – where we visit, what we post about, link to, the apps we use, etc. We also leave trails as to when and where we are when we do these things, what sequence we do them in, and even what those close to us do.

Our digital shadows are long. And these shadows provide an incredibly accurate version of ourselves. You may not remember what you had to eat a few days ago, but the Internet knows exactly what books you read, how fast you read them, and when you bought them. The Internet knows where you were when you looked up health information, your favorite places to travel, whether you lean liberal or conservative, and much more. Your digital shadow is alarmingly accurate.

Privacy issues aside, this creates exciting possibilities for market research.

The amount of information available is staggering.  It is estimated that the volume of digital information available is doubling about every 18 months. This means in the next year and a half we will create as much data as we have since the Internet was created. Clearly it is easy to drown in the noise of this data, and many certainly do. But, in some ways analyzing this data isn’t unlike what we have been doing for years. It is easy to drown in a data set if you don’t have clear hypotheses that you are pursuing.  Tapping into the power of Big Data is all about formulating the right questions before firing up the laptop.

So, how will Big Data change traditional, “active” research? Why would we need to ask people questions when we can track their actual behaviors more accurately?

Big Data will not obviate the need for traditional survey research. But, it will reposition it. Survey research will change and be reserved for marketing problems it is particularly well suited for.  Understanding underlying motivations of behavior will always require that we talk directly to consumers, if only to probe why their reported behavior differs from their actual behavior.

There are situations when Big Data techniques will triumph. We are noticing compelling examples of how Big Data analysis can save the world.  For instance, medical researchers are looking into diseases that are asymptomatic. Typically, an early doctor’s appointment for these diseases will consist of a patient struggling to remember symptoms and warning signs and when they might have had them.  An analysis of Google searches can look at people who can be inferred to have been diagnosed with the disease from their search behavior. Then, their previous search behavior can be analyzed to see if they were curious about symptoms and when.  In the hands of a skilled analyst, this can lead to new insights regarding the early warning signs of diseases that often are diagnosed too late.

There has been chatter that public health officials can track the early spread of the flu better each year by analyzing search trends than by using their traditional ways, which track doctor visits for the flu and prescriptions dispensed. The reason is that people Google for “flu symptoms” in advance of going to the doctor, and many who have symptoms don’t go to the doctor at all. A search trend analysis can help public health officials react faster to outbreaks.

This is all pretty cool. Marketers are all about delivering the right message to the right people at the right time, and understanding how prior online behavior predicts future decisions will be valued. Big Data is accurate in a way that surveys cannot be because memory is imperfect.

Let’s be clear. I don’t think that people lie on surveys, at least not purposefully. But there are memory errors that harm the ability of a survey to uncover the truth. For instance, I could ask on a survey what books you have read in the past month. But, sales data from the Kindle Store would probably be more accurate.

However, what proponents of “Big Data will take over the world” don’t realize is the errors that respondents make on surveys can be more valuable to marketers than the truth because their recollections are often more predictive of their future behavior than their actual past behavior. What you think you had for dinner two nights ago probably predicts what you will eat tonight better than what you actually may have eaten. Perceptions can be more important than reality and marketing is all about dealing with perceptions.

The key for skilled researchers is going to be to learn when Big Data techniques are superior and when traditional techniques will yield better insights. Big Data is a very big hammer, but isn’t suitable for every size nail.

It is an exciting time for our field. Data science and data analysis skills are going to become even more valuable in the labor market than they are today. While technical database and statistical skills will be important, in a Big Data era it will be even more important to have skills in developing the right questions to pursue in the first place and a solid understanding of the issues our clients face.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.