Archive for the 'Methodology' Category

What is p-hacking, and why do most researchers do it?

What sets good researchers apart is their ability to find a compelling story in a data set. It is what we do – we review various data points, combine that with our knowledge of a client’s business, and craft a story that leads to market insight.

Unfortunately, researchers can be too good at this. We have a running joke in our firm that we could probably hand a random data set to an analyst, and they could come up with a story that was every bit as convincing as the story they would develop from actual data.

Market researchers need to be wary of something well-known among academic researchers: a phenomenon known as “p-hacking.” It is a tendency to run and re-run analyses until we discover a statistically significant result.

A “p-value” is one of the most important statistics in research. It can be tricky to define precisely — it is the probability that your effect (research result) is due to chance and not the difference between your test and control. It is the chance that your hypothesis will be falsely rejected. We say the result is statistically significant when a p-value is less than 5%. We mean there is less than 5% we got this result by chance.

Researchers widely use p-values to determine if a result is worth mentioning. In academia, most papers will not be published in a peer-reviewed journal if their p-value is not below 5%. Most quant analysts will not highlight a finding in market research if the p-value isn’t under 5%.

P-hacking is what happens when the initial analysis doesn’t hit this threshold. Researchers will do things such as:

  • Change the variable. Our result doesn’t hit the threshold, so we search for a new measure where it does.
  • Redefine our variables. Using the full range of the response didn’t work, so we look at the top box, the top 2 boxes, the mean, etc., until the result we want pans out.
  • Change the population. It didn’t work with all respondents, but is there something among a subgroup, such as males, young respondents, or customers?
  • Run a table that does statistical testing of all subgroups compared to each other. (Guaranteeing that one in 20 of these significant findings will be due to chance.)
  • Relax the threshold. The findings didn’t work at 5%, so we go ahead and report them anyway and say they are “directional.”’

These tactics are all inappropriate and common. If you are a market researcher and reading this, I’d be surprised if you haven’t done all of these at some point in your career. I have done them all.

P-hacking happens for understandable reasons. Other information outside the study points towards a result we should be getting. Our clients pressure us to do it. And, with today’s sample sizes being so large, p-hacking is easy to do. Give me a random data set with 2,000 respondents, and I will guarantee that I can find statistically significant results and create a story around them that will wow your marketing team.

I learned about p-hacking the hard way. Early in my career, I gathered an extensive data set for a college professor who was well-known and well-published within his field. He asked me to run some statistical analyses for him. When the ones he specified didn’t pan out, I started running the data on subgroups, changing how some variables were defined, etc., until I could present him with significant statistical output.

Fortunately, rather than chastise me, he went into teaching mode. He told me that just fishing around in the data set until you find something that works statistically is not how data analysis should be done. With a big data set and enough hooks in the water, you will always find some insight ready to bite.

Instead, he taught me that you always start with a hypothesis. If that hypothesis doesn’t pan out, first recognize that there is some learning in that. And it is okay to use that learning to adjust your hypothesis and test again, but your analysis has to be driven by the theory instead of the theory being driven by the data.

Good analysis is not about tinkering with data through trial and error. Too many researchers do this until something works. They fail to report on the many unproductive rabbit holes they dug. But, by definition, you’d randomly get a statistically significant result about one time in 20.

This sounds obscure, but I would say that it is the most common mistake I see marketing analysts make. Clients will press us to redefine variables to make a regression work better. We’ll use “top box” measures rather than the full variable range, with no real reason except that it makes our models fit. We relax the level of statistical significance. We p-hack.

In general, market researchers “fish in the data” a lot. I sometimes wonder how many lousy marketing decisions have been made over time due to p-hacking.

I used to sit next to an incredible statistician. As good a data analyst as he was, he was one of the worst questionnaire writers I have ever met. He didn’t seem to care too much, as he felt he could wrangle almost any data into submission with his talent. He was a world-class p-hacker.

I was the opposite. I’ve never been a great statistician. So, I’ve learned to compensate by developing design talent, as I quickly noticed that a well-written questionnaire makes data analysis easy and often obviates the need for complex statistics. I learned over time that a good questionnaire is an antidote to p-hacking. 

Start with hypotheses and think about alternative hypotheses when you design the project. And develop these before you even compose a questionnaire. Never believe that the story will magically appear in your data – instead, start with a range of potential stories and then, in your design, allow for data to support or refute each of them. Be balanced in how you go about it, but be directed as well.

It is vital to push for the time upfront to accomplish this, as the collapsed time frames for today’s projects are a key cause of p-hacking.

Of course, nobody wants to conduct a project and be unable to conclude anything. If that happens, you likely went wrong at the project’s design stage – you didn’t lay out objectives and potential hypotheses well. Resist the tendency to p-hack, be mindful of this issue, and design your studies well so you won’t be tempted to do it.

Why the Media Cried (Red) Wolf

Journalists are puzzled as to why a predicted “red wave” (a Republican resurgence) did not materialize in the 2022 midterm elections. The signals that the red wave would fail to form were clear. The failure of journalists to foresee the success of Democratic candidates was caused by their inability to discern the good polls from the bad.

Established, media- and college-branded polls performed historically well in this cycle. They provided all the data necessary to foresee that a red wave would not emerge.

So why was there such a widespread view that the Republicans would have a big night?

The answer is that journalists have become indiscriminate in their polling coverage. Conservative-leaning pollsters released a flood of poor-quality polls in the last two weeks before the election. These polls pointed to a brewing red tsunami, and the media covered them with little, if any, due diligence.

I have had conversations with long-time pollsters who, through rolled eyes, tell me they think some of these pollsters are simply making up their numbers. In this cycle, pollsters obtained cross-tabulations from a Trafalgar poll that indicated that almost two-thirds of Gen Z Voters would vote for a MAGA candidate in Georgia (when one-third would have represented a historic swing). Yet, respected journalists widely reported the results of this very same poll.

Trafalgar’s 2022 polls were demonstrably inaccurate. Trafalgar released 19 statewide polls in the week preceding the election. These polls chose the correct winner in just 11 of these polls. Just seven were within their margin of error, and Trafalgar’s mean polling error is likely to end up being more than double the mean polling error of “name-brand” pollsters.

It is understandable that right-leaning media are interested in these polls, as they provide a hopeful, confirmatory message their audience wants to hear. Since reputable polls have erred in a liberal direction in the past few cycles, there is a sense that we cannot trust them anymore.

Journalists ignored that polls have always fluctuated between missing in a liberal or conservative direction. Because polls have been off in a liberal direction in the past two presidential elections, journalists have assumed a liberal bias is here to stay. In 2022, this proved to be incorrect.

It isn’t just the media that provide oxygen to these polls. Poll aggregators (particularly RealClearPolitics) had a horrible cycle because they were indiscriminate in which polls were included in their averages. Predictive modelers (such as FiveThirtyEight) had a solid night that could have been tremendous if they could get out of a mentality that every poll has something of value to contribute to their models.

Reporting on polls with suspect methods is simply bad journalism. Trusted journalists would never release a story without considerable fact-checking of their sources. Yet, they continue to cover polls that are not transparent, have poor track records, have no defensible methodology, and are shunned by the polling establishment.  

This is journalistic malpractice, and the result can be dire. When the election results do not match expectations set by the polls, an environment is fostered where election denialism thrives. January 6th happened partly because the partisan polls the protesters focused on had Donald Trump winning the election, and good journalists fueled this mentality by reporting on these polls. They provided these polls with a legitimacy they did not deserve.

Statistical laws imply that we cannot know in advance which polls will be correct in any given election. But we know which ones meet industry standards for methodology and disclosure and that, in the long term, have been proven to get it right far more often than they get it wrong.

It is no secret that pollsters face technological headwinds, but their occasional misses are not for lack of trying. After each election, pollsters convene, share findings, and discuss how to improve polls for the next election. In this sense, polling is one of the most honest professions.

Do you know who is missing from these conversations and not contributing to this honesty? The conservative-leaning pollsters.

My advice to journalists is this: stick to credible polls and stop giving every poll a voice. Rely more on the pollsters themselves for editorial decisions on what goes in the polls and the interpretations of their results. Stop creating the news by being too involved in the content of polls and return to doing what you do best: report on poll findings and provide context.

Above all, fact-check the polls like you would any other source.

Polling’s Winners and Losers from the Midterms

The pollsters did well last night.

Right now (the morning after the election), it is hard to know if 2022 will go down as a watershed moment when pollsters once again found their footing or if it will merely be a stay of execution. The 2018 midterms were also quite good for pollsters, yet the 2020 election was not.

To be clear, there are still many votes to count, so it is unfair to judge the polls too quickly. In POLL-ARIZED, I criticize media members who do. Nonetheless, below is a list of what I see as some winners and losers and some that seem like they are in the middle.

The Winners

  • Pre-election polling in general. For the most part, the polls did a good job of pointing out the close races, and exit polls suggest that they did an excellent job of highlighting the issues that concern voters most. I suspect the polling error rate will be far below the historical average of five+ points for midterm elections.
  • The “good” pollsters. The better-known polling brands, especially those with media partnerships, and some college polling centers had good results.
  • John King’s brain. Say what you want about CNN, but watching someone who knows the name of every county in America, the candidates in every election district, and the results of past elections perform without a net and stick the landing is impressive.
  • The CNN magic wall. I know other networks have them, but I can’t be the only data geek who marvels at the database systems and APIs behind CNN’s screen. It must have cost millions and involved dozens of people.
  • The Iowa Poll’s response rate. Their methodology statement says they contacted 1,118 Iowa residents for a final sample size of 801, with a response rate of 72%. This reminds me of the good old days. I would like to see pollsters spend more time benchmarking what Selzer & Co. are doing right with this poll.

The Losers

  • The partisan pollsters, particularly Trafalgar. These pollsters were way off this cycle. They have been way off in most cycles. I hope that non-partisan media outlets will stop covering them. They provide a story that outlets and viewers seeking a confirmation bias enjoy, but objective media should leave them behind for good.
  • The media who failed to see that there were so many less-reputable conservative polls released over the past two weeks. Most media were hoodwinked by this and ran a narrative that a red storm was brewing.
  • Response rates. I delved into the methodology of many final polls this cycle; most had net response rates of less than 2%. That is about half what response rates were just two years ago. The fact that the pollsters did so well with this low response is a testament to the brilliance of methodologists, but the data they have to work with is getting worse each cycle. They will not be able to keep pulling rabbits out of their hat.
  • The prediction markets. I have long hoped that the betting markets can emerge to provide a plausible alternative to polls regarding predicting elections so that the polls can focus on issues and not the horse races. These markets did not have a good night.
  • FiveThirtyEight’s pollster ratings. It is too early to make a definitive statement, but some of their highly rated pollsters had poor results, while many with middling grades did well. These ratings are helpful when they are accurate and have a defensible method behind them. When these gradings are inaccurate, they ruin reputations and businesses, so FiveThirtyEight must embrace that producing objective and accurate ratings is a serious responsibility.

The “So-So”

  • The Iowa Poll. Even with the high response, this poll seemed to overstate the Republican vote this time. They did get all the winners correct. This poll has a strong history of success, so it might be fair to chalk the slight miss up to normal sampling fluctuation. It isn’t statistically possible to get it right every single time. I must admit I have a bias of rooting for this poll.
  • The modelers, such as FiveThirtyEight and the Economist. On the hand, the concept of a probabilistic forecast is spot on. On the other, it is not particularly informative in coin-toss races. In this cycle, the forecasts they made for Senate and House seats weren’t much different than what could have been made by just tossing a coin in the contested races. Their median predictions for House and Senate seats overstated where the Republicans will end up, possibly because they also fell prey to the release of so many conservative-leaning polls in the campaign’s final stages.
  • Polling error direction. In the past few cycles, the polling error has been in the direction of overcounting Democrats. In 2022, this error seemed to move in the other direction. Historically, these errors have been uncorrelated from election to election, so I must admit that I’ve probably jumped the gun by suggesting in POLL-ARIZED the pro-Democrat error direction was structural and here to stay.
  • The media’s coverage of the polls on election day. In 2016 and 2020, the press reveled in bashing the pollsters. This time, they hardly talked about them at all. That seemed a bit unfair – if pollsters are going to be criticized when they do poorly, they should be celebrated when they do well.

All-in-all, a good night for the pollsters. But, I don’t want to rush to a conclusion that the polls are now fixed because, in reality, the pollsters didn’t change much in their methods from 2020. I hope the industry will study what went right, as we tend to re-examine our methods when they fail, not when they succeed.

Your grid questions probably aren’t working

Convincing people to participate in surveys and polls has become so challenging that more attention is going toward preventing them from suspending once they choose to respond.

Most survey suspends occur in one of two places. The first is at the initial screen the respondent sees. Respondents click through an invitation, and many quickly decide that the survey isn’t for them and abandon the effort.

The second most common place is the first grid question respondents encounter. They see an imposing grid question and decide it isn’t worth their time to continue. It doesn’t matter where this question is placed – this happens whether the first grid question is early in the questionnaire, in the middle, or toward the end.

Respondents hate answering grid questions. Yet clients continue to ask them, and survey researchers include them without much thought. The quality of data they yield tends to be low.

A measurement error issue with grid questions is known as “response set bias.” When we present a list of, say, ten items, we want to get a respondent to make an independent judgment of each, unrelated to what they think of the others. But, with a long list of items, that is not what happens. Instead, when people respond to later questions, they remember what they said earlier. If I indicated that feature A in a list was “somewhat important” to me when I assess feature B, it is natural to think about how it compares in importance to feature A. This introduces unwanted correlations into the data set.

Instead, we want a respondent to assess feature A, clear their mind entirely, and then assess feature B. That is a challenging task, but placing features on a long, intimidating list, makes it near impossible. Some researchers think we eliminate this error by randomizing the list order, but all that does is spread the error out. It is important to randomize the options so this error doesn’t concentrate on just a few items, but randomization does not solve the problem.

Errors you have probably heard of lurk in long grid questions. Things like fatigue biases (respondents attend less to the items late in the list), question order biases, priming effects, recency biases, etc. In short, grid questions are just asking for many measurement errors, and we end up crossing our fingers and hoping some of these cancel each other out.

This is admittedly a mundane topic, but it is the one questionnaire design issue I have the most difficulty convincing clients to do something about. Grid questions capture a lot of data in a short amount of questionnaire time, so they are enticing for clients.

I prefer a world where we seldom ask them. If we need to, we recommend maybe one or two per questionnaire and never more than 4 to 6 items in them. I rarely succeed in convincing clients of this.

“Textbook” explanations of problems with grid questions do not include the issue that bothers me most. What happens in grid questions is the question respondents hear and respond to is often not the literal question that is composed.

Consider a grid question like this, with a 5-point importance scale as the response options:

Q: How important were the following when you decided to buy the widget?

  1. The widget brand cares about sustainability
  2. The price of the widget
  3. The color of the widget is attractive to you
  4. The widget will last a long time

Think about the first item (“The widget brand cares about sustainability”). The client wants to understand how important sustainability is in the buying decision. How important of a buying criterion is sustainability?

But that is likely not what the respondent “hears” in the question. The respondent will probably see the question as asking if they care about sustainability and who doesn’t? So, what would tend to happen is sustainability would be overstated as a decision driver when analyzing the data set. Respondents don’t leap to thinking about sustainability as a buying consideration; instead, they respond about sustainability in general.

Clients and suppliers must realize that respondents do not parse our words as we would like them to, and they do not always attend to our questions. We need to anticipate this.

How do we fix this issue?  We should be more straightforward in how we ask questions. In this example, I would prefer to derive the importance of sustainability in the buying decision. I’d include a question asking how much they care about sustainability (and be careful to phrase it so it can have a response across various answer choices).  Then, in a second question, I would gather a dependent variable asking how likely they are to buy the widget in the future.

A regression or correlation analysis would provide coefficients across variables that indicate their relative importance. Yes, it would be based on correlations and not necessarily causation. In reality, research studies rarely set up the experiments necessary to give evidence of causation, and we should not get too hung up on that.

I would conclude that sustainability is an essential feature if it popped in the regression as having a high coefficient and if I saw something else in other questions or open-ends that indicated sustainability mattered from another angle. Always look for another data point or another data source that supports your conclusion.

Grid questions are the most over-rated and overused types of survey questions. Clients like them, but they tend to provide poor-quality data. Use them sparingly and look for alternatives.

Pre-Election Polling and Baseball Share a Lot in Common

The goal of a pre-election poll is to predict which candidate will win an election and by how much. Pollsters work towards this goal by 1) obtaining a representative sample of respondents, 2) determining which candidate a respondent will vote for, and 3) predicting the chances each respondent will take the time to vote.

All three of these steps involve error. It is the first one, obtaining a representative sample of respondents, which has changed the most in the past decade or so.

It is the third characteristic that separates pre-election polling from other forms of polling and survey research. Statisticians must predict how likely each person they interview will be to vote. This is called their “Likely Voter Model.”

As I state in POLL-ARIZED, this is perhaps the most subjective part of the polling process. The biggest irony in polling is that it becomes an art when we hand the data to the scientists (methodologists) to apply a Likely Voter Model.

It is challenging to understand what pollsters do in their Likely Voter Models and perhaps even more challenging to explain.  

An example from baseball might provide a sense of what pollsters are trying to do with these models.

Suppose Mike Trout (arguably the most underappreciated sports megastar in history) is stepping up to the plate. Your job is to predict Trout’s chances of getting a hit. What is your best guess?

You could take a random guess between 0 and 100%. But, since that would give you a 1% chance of being correct, there must be a better way.

A helpful approach comes from a subset of statistical theory called Bayesian statistics. This theory says we can start with a baseline of Trout’s hit probability based on past data.

For instance, we might see that so far this year, the overall major league batting average is .242. So, we might guess that Trout’s probability of getting a hit is 24%.

This is better than a random guess. But, we can do better, as Mike Trout is no ordinary hitter.

We might notice there is even better information out there. Year-to-date, Trout is batting .291. So, our guess for his chances might be 29%. Even better.

Or, we might see that Trout’s lifetime average is .301 and that he hit .333 last year. Since we believe in a concept called regression to the mean, that would lead us to think that his batting average should be better for the rest of the season than it is currently. So, we revise our estimate upward to 31%.

There is still more information we can use. The opposing pitcher is Justin Verlander. Verlander is a rare pitcher who has owned Trout in the past – Trout’s average is just .116 against Verlander. This causes us to revise our estimate downward a bit. Perhaps we take it to about 25%.

We can find even more information. The bases are loaded. Trout is a clutch hitter, and his career average with men on base is about 10 points higher than when the bases are empty. So, we move our estimate back up to about 28%.

But it is August. Trout has a history of batting well early in and late in the season, but he tends to cool off during the dog days of summer. So, we decide to end this and settle on a probability of 25%.

This sort of analysis could go on forever. Every bit of information we gather about Trout can conceivably help make a better prediction for his chances. Is it raining? What is the score? What did he have for breakfast? Is he in his home ballpark? Did he shave this morning? How has Verlander pitched so far in this game? What is his pitch count?

There are pre-election polling analogies in this baseball example, particularly if you follow the probabilistic election models created by organizations like FiveThirtyEight and The Economist.

Just as we might use Trout’s lifetime average as our “prior” probability, these models will start with macro variables for their election predictions. They will look at the past implications of things like incumbency, approval ratings, past turnout, and economic indicators like inflation, unemployment, etc. In theory, these can adjust our assumptions of who will win the election before we even include polling data.

Of course, using Trout’s lifetime average or these macro variables in polling will only be helpful to the extent that the future behaves like the past. And therein lies the rub – overreliance on past experience makes these models inaccurate during dynamic times.

Part of why pollsters missed badly in 2020 is unique things were going on – a global pandemic, changed methods of voting, increased turnout, etc. In baseball, perhaps this is a year with a juiced baseball, or Trout is dealing with an injury.

The point is that while unprecedented things are unpredictable, they happen with predictable regularity. There is always something unique about an election cycle or a Mike Trout at bat.

The most common question I am getting from readers of POLL-ARIZED is, “will the pollsters get it right in 2024?” My answer is that since pollsters are applying past assumptions in their model, they will get it right to the extent that the world in 2024 looks like the world did in 2020, and I would not put my own money on it.

I make a point in POLL-ARIZED that pollsters’ models have become too complex. While in theory, the predictive value of a model never gets worse when you add in more variables, in practice, this has made these models uninterpretable. Pollsters include so many variables in their likely voter models that many of their adjustments cancel each other out. They are left with a model with no discernable underlying theory.

If you look closely, we started with a probability of 24% for Trout. Even after looking at a lot of other information and making reasonable adjustments, we still ended up with a prediction of 25%. The election models are the same way. They include so many variables that they can cancel out each other’s effects and end up with a prediction that looks much like the raw data did before the methodologists applied their wizardry.

This effort is better spent at getting better input for the models by investing in generating the trust needed to increase the response rates we get to our surveys and polls. Improving the quality of our data input will increase the predictive quality of the polls more than coming up with more complicated ways to weight the data.

Of course, in the end, one candidate wins, and the other loses, and Mike Trout either gets a hit, or he doesn’t, so the actual probability moves to 0% or 100%. Trout cannot get 25% of a hit, and a candidate cannot win 79% of an election.

As I write this, I looked up the last time Trout faced Verlander. It turns out Verlander struck him out!

Things That Surprised Me When Writing a Book

I recently published a book outlining the challenges election pollsters face and the implications of those challenges for survey researchers.

This book was improbable. I am not an author nor a pollster, yet I wrote a book on polling. It is a result of a curiosity that got away from me.

Because I am a new author, I thought it might be interesting to list unexpected things that happened along the way. I had a lot of surprises:

  • How quickly I wrote the first draft. Many authors toil for years on a manuscript. The bulk of POLL-ARIZED was composed in about three weeks, working a couple of hours daily. The book covers topics central to my career, and it was a matter of getting my thoughts typed and organized. I completed the entire first draft before telling my wife I had started it.
  • How long it took to turn that first draft into a final draft. After I had all my thoughts organized, I felt a need to review everything I could find on the topic. I read about 20 books on polling and dozens of academic papers, listened to many hours of podcasts, interviewed polling experts, and spent weeks researching online. I convinced a few fellow researchers to read the draft and incorporated their feedback. The result was a refinement of my initial draft and arguments and the inclusion of other material. This took almost a year!
  • How long it took to get the book from a final draft until it was published. I thought I was done at this point. Instead, it took another five months to get it in shape to publish – to select a title, get it edited, commission cover art, set it up on Amazon and other outlets, etc. I used Scribe Media, which was expensive, but this process would have taken me a year or more if I had done it without them.
  • That going for a long walk is the most productive writing tactic ever. Every good idea in the book came to me when I trekked in nature. Little of value came to me when sitting in front of a computer. I would go for long hikes, work out arguments in my head, and brew a strong cup of coffee. For some reason, ideas flowed from my caffeinated state of mind.
  • That writing a book is not a way to make money. I suspected this going in, but it became clear early on that this would be a money-losing project. POLL-ARIZED has exceeded my sales expectations, but it cost more to publish than it will ever make back in royalties. I suspect publishing this book will pay back in our research work, as it establishes credibility for us and may lead to some projects.
  • Marketing a book is as challenging as writing one. I guide large organizations on their marketing strategy, yet I found I didn’t have the first clue about how to promote this book. I would estimate that the top 10% of non-fiction books make up 90% of the sales, and the other 90% of books are fighting for the remaining 10%.
  • Because the commission on a book is a few dollars per copy, it proved challenging to find marketing tactics that pay back. For instance, I thought about doing sponsored ads on LinkedIn. It turns out that the per-click charge for those ads was more than the book’s list price. The best money I spent to promote the book was sponsored Amazon searches. But even those failed to break even.
  • Deciding to keep the book at a low price proved wise. So many people told me I was nuts to hold the eBook at 99 cents for so long or keep the paperback affordable. I did this because it was more important to me to get as many people to read it as possible than to generate revenue. Plus, a few college professors have been interested in adopting the book for their survey research courses. I have been studying the impact of book prices on college students for about 20 years, and I thought it was right not to contribute to the problem.
  • BookBub is incredible if you are lucky enough to be selected. BookBub is a community of voracious readers. I highly recommend joining if you read a lot. Once a week, they email their community about new releases they have vetted and like. They curate a handful of titles out of thousands of submissions. I was fortunate that my book got selected. Some authors angle for a BookBub deal for years and never get chosen. The sales volume for POLL-ARIZED went up by a factor of 10 in one day after the promotion ran.
  • Most conferences and some podcasts are “pay to play.” Not all of them, but many conferences and podcasts will not support you unless you agree to a sponsorship deal. When you see a research supplier speaking at an event or hear them on a podcast, they may have paid the hosts something for the privilege. This bothers me. I understand why they do this, as they need financial support. Yet, I find it disingenuous that they do not disclose this – it is on the edge of being unethical. It harms their product. If a guest has to pay to give a conference presentation or talk on a podcast, it pressures them to promote their business rather than have an honest discussion of the issues. I will never view these events or podcasts the same. (If you see me at an event or hear me on a podcast, be assured that I did not pay anything to do so.)
  • That the industry associations didn’t want to give the book attention. If you have read POLL-ARIZED, you will know that it is critical (I believe appropriately and constructively) of the polling and survey research fields. The three most important associations rejected my proposals to present and discuss the book at their events. This floored me, as I cannot think of any topics more essential to this industry’s future than those I raise in the book. Even insights professionals who have read the book and disagree with my arguments have told me that I am bringing up points that merit discussion. This cold shoulder from the associations made me feel better about writing that “this is an industry that doesn’t seem poised to fix itself.”
  • That clients have loved the book. The most heartwarming part of the process is that it has reconnected me with former colleagues and clients from a long research career. Everyone I have spoken to who is on the client-side of the survey research field has appreciated the book. Many clients have bought it for their entire staff. I have had client-side research directors I have never worked with tell me they loved the book.
  • That some of my fellow suppliers want to kill me. The book lays our industry bare, and not everyone is happy about that. I had a competitor ask me, ” Why are you telling clients to ask us what our response rates are?” I stand behind that!
  • How much I learned along the way. There is something about getting your thoughts on paper that creates a lot of learning. There is a saying that the best way to learn a subject is to teach it. I would add that trying to write a book about something can teach you what you don’t know. That was a thrill for me. But then again, I was the type of person who would attend lectures for classes I wasn’t even taking while in college. I started writing this book to educate myself, and it has been a great success in that sense.
  • How tough it was for me to decide to publish it. There was not a single point in the process when I did not consider not publishing this book. I found I wanted to write it a lot more than publish it. I suffered from typical author fears that it wouldn’t be good enough, that my peers would find my arguments weak, or that it would bring unwanted attention to me rather than the issues the book presents. I don’t regret publishing it, but it would never have happened without encouragement from the few people who read it in advance.
  • The respect I gained for non-fiction authors. I have always been a big reader. I now realize how much work goes into this process, with no guarantee of success. I have always told people that long-form journalism is the profession I respect the most. Add “non-fiction” writers to that now!

Almost everyone who has contacted me about the book has asked me if I will write another one. If I do, it will likely be on a different topic. If I learned anything, this process requires selecting an issue you care about passionately. Journalists are people who can write good books about almost anything. The rest of us mortals must choose a topic we are super interested in, or our books will be awful.

I’ve got a few dancing around in my head, so who knows, maybe you’ll see another book in the future.

For now, it is time to get back to concentrating on our research business!

The Insight that Insights Technology is Missing

The market research insights industry has long been characterized by a resistance to change. This likely results from the academic nature of what we do. We don’t like to adopt new ways of doing things until they have been proven and studied.

I would posit that the insights industry has not seen much change since the transition from telephone to online research occurred in the early 2000s. And even that transition created discord within the industry, with many traditional firms resistant to moving on from telephone studies because online data collection had not been thoroughly studied and vetted.

In the past few years, the insights industry has seen an influx of capital, mostly from private equity and venture capital firms. The conditions for this cash infusion have been ripe: a strong and growing demand for insights, a conservative industry that is slow to adapt, and new technologies arising that automate many parts of a research project have all come together simultaneously.

Investing organizations see this enormous business opportunity. Research revenues are growing, and new technologies are lowering costs and shortening project timeframes. It is a combustible business situation that needs a capital accelerant.

Old school researchers, such as myself, are becoming nervous. We worry that automation will harm our businesses and that the trend toward DIY projects will result in poor-quality studies. Technology is threatening the business models under which we operate.

The trends toward investment in automation in the insights industry are clear. Insights professionals need to embrace this and not fight it.

However, although the movement toward automation will result in faster and cheaper studies, this investment ignores the threats that declining data quality creates. In the long run, this automation will accelerate the decline in data quality rather than improve it.

It is great that we are finding ways to automate time-consuming research tasks, such as questionnaire authoring, sampling, weighting, and reporting. This frees up researchers to concentrate on drawing insights out of the data. But, we can apply all the automation in the world to the process, yet if we do not do something about data quality, it will not increase the value clients receive.

I argue in POLL-ARIZED that the elephant in the research room is the fact that very few people want to take our surveys anymore. When I began in this industry, I routinely fielded telephone projects with 70-80% response rates. Currently, telephone and online response rates are between 3-4% for most projects.

Response rates are not everything. You can make a compelling argument that they do not matter at all. There is no problem as long as the 3-4% response we get is representative. I would rather have a representative 3% answer a study than a biased 50%.

But, the fundamental problem is that this 3-4% is not representative. Only about 10% of the US population is currently willing to take surveys. What is happening is that this same 10% is being surveyed repeatedly. In the most recent project Crux fielded, respondents had taken an average of 8 surveys in the past two weeks. So, we have about 10% of the population taking surveys every other day, and our challenge is to make them represent the rest of the population.

Automate all you want, but the data that are the backbone of the insights we are producing quickly and cheaply is of historically low quality.

The new investment flooding into research technology will contribute to this problem. More studies will be done that are poorly designed, with long, tortuous questionnaires. Many more surveys will be conducted, fewer people will be willing to take them, and response rates will continue to fall.

There are plenty of methodologists working on these problems. But, for the most part, they are working on new ways to weight the data we can obtain rather than on ways to compel more response. They are improving data quality, but only slightly, and the insights field continues to ignore the most fundamental problem we have: people do not want to take our surveys.

For the long-term health of our field, that is where the investment should go.

In POLL-ARIZED, I list ten potential solutions to this problem. I am not optimistic that any of them will be able to stem the trend toward poor data quality. But, I am continually frustrated that our industry has not come together to work towards expanding respondent trust and the base of people willing to take part in our projects.

The trend towards research technology and automation is inevitable. It will be profitable. But, unless we address data quality issues, it will ultimately hasten the decline of this field.

POLL-ARIZED available on May 10

I’m excited to announce that my book, POLL-ARIZED, will be available on May 10.
 
After the last two presidential elections, I was fearful my clients would ask a question I didn’t know how to answer: “If pollsters can’t predict something as simple as an election, why should I believe my market research surveys are accurate?”
 
POLL-ARIZED results from a year-long rabbit hole that question led me down! In the process, I learned a lot about why polls matter, how today’s pollsters are struggling, and what the insights industry should do to improve data quality.
 
I am looking for a few more people to read an advance copy of the book and write an Amazon review on May 10. If you are interested, please send me a message at poll-arized@cruxresearch.com.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.