Archive for the 'Statistics and probability' Category

Ask the right questions, but of the right people!

Market research and polling is a $10 billion + industry in the US alone. It employs some of the sharpest statisticians and methodological minds that our universities produce. Yet, what is clear is this is an industry that still makes many high profile mistakes.

Here is something we’ve gleaned over the years: When you ask the wrong question on a survey or forget to ask a question, you actually have a reasonable chance to recover from the mistake, provided you notice it. You can change your interpretation of the result based on your imperfect question, you can look to other questions you asked to see if they shed light on the result, and you can look for a secondary source for the information. Not an ideal situation for sure, but often asking a poor question can be a recoverable error. We’ve yet to see the perfect questionnaire or even the perfect question, so in some sense we deal with this issue on every project.

However, if you ask the question of the wrong people, there is usually nothing you can do to correct the error, short of repeating the study. Failure to spot this type of mistake can lead to poor guidance to clients. Posing questions to the wrong audience might well be the most disastrous thing market researchers can do.

And it happens. Early in my career, I remember our firm conducting a phone study where an improper randomization technique occurred which caused us to call one half of a town and not the other. We literally missed everyone living on the “other side of the tracks” who had different opinions of the issues we were covering. The only solution was to repeat the entire project.

More common than actual sample coverage errors is failing to interview the correct individual. This is an ongoing challenge for business-to-business studies. Clients want to talk to a “decision maker,” yet often the decision maker and the user of the product are not the same individual. So, studies end up gathering detailed information on how to improve a product from people who don’t actually use it.

We conduct quite a bit of research with youth audiences. It is common for our clients to ask us to conduct what we call “hearsay” studies. They will ask us, for example, to interview parents and ask them what their kids think about an issue.  Why do that when we can ask the child directly?  We should always be asking questions that our respondents are screened to be the best individuals to answer.

We often see that seasoned researchers or astute up-and-comers realize the importance of sampling and screening.  Which is why, upfront, they dwell more on the sampling aspects of a project than on questionnaire topics and wording.  We always know we are dealing with a client who knows what they are doing if they concentrate the initial conversations on who we want to reach and how we want to reach them, and wait to discuss what we specifically want to ask.

Wanna bet that Hillary will be the next President?


There is a movement afoot to allow Las Vegas casinos to take bets on Presidential elections.  Betting on who will be the next President has the potential to increase the interest level in the election and perhaps voter turnout as a consequence. Of course, it will also bring more revenue to Nevada casinos. Detractors of the idea cite the typical arguments against gambling of any kind. I suppose campaign insiders could engineer a campaign emergency to sabotage their candidate and collect winning bets they have made on the other side.

One aspect that hasn’t been discussed is whether betting on elections would make for better predictions than current polling methods.

Election polling is simple at its core. Pollsters find a representative sample of likely voters and ask a basic question:  if the election were held today, whom would you vote for?  While the question itself is basic, polls often disagree on the answers. Differences in the polls tend to stem from how the sample was drawn, how “likely voters” are classified, and context (issue questions that may have preceding the voting question). On the whole, the major polling organizations do a good job with election polls, and, especially if you group all the polls together, they make excellent predictions. But, the pollsters are not always right. If we had left it up to the major polling organizations to select our Presidents, our children would be learning about the policies of President Alf Landon and President Thomas Dewey. (Of course if we trusted the polls and not the actual election, we also would be teaching about President Al Gore, but that is another story.)

A few election cycles back a few firms tried a new approach. Rather than ask “whom would you vote for?” the new approach asked “regardless of whom you may favor or vote for, who do you think will win the election?” This was seen as an attempt to get over the difficulty of predicting turnout. It was sort of a way to crowd source an election poll. The approach worked well, but has been tried too infrequently to make a definitive judgment. While election polls are great experiments in that we can judge their success or failure by a real-world result, they aren’t so good in that the sample size of national elections is small.

An interesting approach to the last few election cycles was taken by Intrade. Intrade was a “prediction market” — an exchange that traded shares for future events that had a “yes/no” type outcome, for instance, “will Barack Obama win the election?” The share price for this would be between $0 and $1. Once the election is over, a share of Obama would close at $1 if he won, and $0 if he lost. Since there was an active market in this trading, you could make real bets with real money on the election depending on where you stood. For instance, if an Obama share was trading at 72 cents, this could be interpreted as saying the market feels he has a 72% chance of winning. If you felt Obama had a greater than 72% chance of winning you’d buy his “stock.” When the election was over, you’d either lose 72 cents if he lost the election or make 28 cents if he won. What was interesting about the approach was watching how the stock price would move as the campaign season progressed.

After the conventions or debates, Obama’s share price would change. At any moment, the share price reflected the probability of victory. A good speech would move his price (and probability of winning) up a few points. Intrade was an excellent predictor and took into account the uncertainty inherent in predictions in an understandable way. The share prices of the candidates clearly showed their probability of winning in real time.

Allowing Vegas style betting on Presidential elections would be similarly interesting. But would it be accurate?

Vegas bookmakers establish initial odds on an event, and these odds (or a point spread in the case of football) evolve depending on how the betting comes in. Many people don’t realize that the oddsmakers are not actually concerned about the probability of who might win the football game. Instead, they set and adjust odds/point spreads to attract an even amount of money bet on both sides of the game, as that is how the casino maximizes its profits. So, the spread might not reflect the probability of winning, especially for teams with large, rabid fan bases, who may irrationally wager on their team (providing a buying opportunity on the other side for the rest of us). How do they do? In a perfect world (from the casino’s point-of-view), 50% of the underdogs would win and 50% of the favorites would win. In 2013, 512 regular season NFL games were played. The favorites won 248 times (48.9%). This is not significantly different than 50% in a statistical sense, so it appears that the sports books do a pretty good job.

It would not surprise me if allowing Vegas casinos to take election bets would result in a better prediction than the polls. Money tends to flow rationally and in response to new information, and it likely behaves more rationally than individual respondents in a poll. The polls won’t go out of business, as the polls have an excellent ability of understanding who voter for whom and why, and these results drive campaign decisions and cable TV news content. Of course the best approach to predicting who the next President will be is probably to just ask Nate Silver. 🙂

Is this study significant?


In all of our dealings with clients, there is one question that we wince at:  “is this study statistically significant?”

This question is cringeworthy because it is challenging to understand what is meant by the question and the answer is never a simple “yes” or “no.”

The term “significant” has a specific meaning in statistics which is not necessarily its meaning in everyday English. When we say something is significant in our daily lives, we tend to imply that it is meaningful, important, or worthy of attention. In a statistical context, the meaning of the term is narrow:  it just means that there is a high probability our findings are not due to chance.

For instance, suppose we conduct a study and find that 55% of women and 50% of men prefer Coke over Pepsi. Researchers will tend to say that there is a statistically significant difference in the Coke/Pepsi gender preference as long as there is a 95% probability or better than these two numbers are different.

I won’t bore you with the statistical calculation, but in this case we would have had to interview about 1,000 women and 1,000 men in order to highlight this difference as being significant.

But, just because these two findings are statistically significant, doesn’t necessarily imply that they are practically important.  Whether or not a 5 point difference between men and women is something worth noting is really a more qualitative issue. Is that a big enough difference to matter? All we can really say as researchers is that yep, odds are pretty good the two numbers are different.

And that is where the challenge lies. Statistical significance and practical importance are not necessarily the same thing. Statistical significance is calculated mainly by knowing the sample size and the variance of response.  The more people you interview and the more they tend to have the same answers, the easier it is to find statistically significant differences.

The custom is to only highlight differences with a 95% or greater probability of being not being due to chance. But this is nothing more than a tradition. There is no reason not to highlight differences with a greater or less probability.  In a sense, every study that is implemented well is provides statistically significant results – it just depends on how much of a chance you are willing to take of making an error.

I recently had a client ask me what it would take to have a 0% chance of being wrong. The short answer is you would have to interview everybody in the population. And do it in a perfect, no-biasing way.

So, the correct answer to “is this study significant” is “it depends on how certain you want to be.” That is rarely a satisfying response, which is why we don’t like the question to begin with!

The Best Graph Ever

Marketers tend to be obsessed with graphs. A challenge for many research projects is determining how to best distill statistics gathered from hundreds of respondents into a simple picture that makes a convincing point. A good graph balances a need for simplicity with an appreciation for the underlying complexity of the data.

Recently, as part of a year-end series, the Washington Post has been unveiling its “Graphs of the Year.” The Post has been inviting its contributing “wonks” to choose one graph that best encapsulates 2013 for them. I found myself spending way too much time looking through them. Some of the graphs are truly outstanding summaries of a key issue – and their conclusions are striking. Others show the political biases of the wonks themselves, and show how data can indeed be manipulated to make a point. If I were to teach a class in market research, an entire lesson would be devoted to these graphs.

I judge graphs by a simple criterion:  If you were carrying a deck of graphs down the hallway and one fell onto the floor, would someone who picked it up be able to understand its main point, without any other context? We try our best to draw graphs that meet this threshold.

In the end, the good graphs from the Post are those that spur thought and are ideologically independent. In particular, I like Bill Gates’ graph which shows the causes of death in the world as well as how each is increasing or decreasing. This isn’t a simple graph, but it clearly shows the progress the world is making and priorities for the future.

Some graphs didn’t do it for me. Senator Wydens’ graph reminded me of a David Ogilvy quotation: “They use [research] as a drunkard uses a lamp post —  for support, rather than for illumination.” Wyden’s graph came off as a platform to make a political point. It confused me and I didn’t see how the conclusions he suggests flow from the graph at all.

Senator Patty Murray’s graph may very well make a valid point about what drives the federal deficit, but it shows a shocking example of correlation and causation not being the same thing. Just because two lines are displayed next to each other does not mean one leads to another, or in this case, does not mean they are not correlated. Her explanation of the graph is political and as far as I can tell the graph not only doesn’t illuminate her point. The graph doesn’t seem to make any point at all.

Perhaps the most misleading graph comes from Peter Thiel. His graph indicates that as student loan debt has increased, the median income of households with a bachelor’s degree has declined. The problem? The two lines on the graph are on different scales! The median income line is shown per household. The student debt load line is across the entire population. They aren’t comparable.

To make sense, the student debt load should instead be shown on a per household basis. College enrollments have increased steadily over the time frame of the graph, so of course debt load in total will be increasing. And, as a wider base of students pursue degrees, it would be expected that median income might be impacted downward. This is not to say that student debt is not an important issue – as it clearly is. But, this graph seems to take the focus off student debt and indicate that college education does not pay off.

So, what would I have picked as my graph of the year? Actually, I can do it one better. I have a graph that I consider to be the best graph of all time.  It comes from Gallup and is shown in the graph below. (It is best to go to Gallup’s site and click on “historical trend” to view this graph, which shows up-to-date tracking through Obama.)


This graph shows the Presidential approval rating tracked since modern polling began. It begins with Harry Truman and on Gallup’s site it runs right up to Obama’s current numbers.

I like it because it is a clear and consistent measure over a long period of time. To me, it is fascinating to look at with a mind towards what was going on historically as the polls were taken. It shows how memories can change as events move to the past. For instance, G.W. Bush’s approval rating was nearly the highest ever measured early in his Presidency (just after 9/11) and moved to one of the lowest ever measured by the time he left office. Clinton is the only President whose approval rating displays a positive trend throughout his term in office. Kennedy’s approval rating was moderate by historical standards near the end of his time in office.

To me, it is fascinating to think of a historical event and then look at the chart to see what happened to approval ratings.Watergate preceded a large drop in Nixon’s ratings, and Ford’s pardon of Nixon did the same. Once WWII was over and the US became mired in Korea Truman’s popularity took a huge hit. Eisenhower’s ratings were very stable compared to others.

All Presidents start office with their approval ratings at their highest. It seems that the first day on the job is the best day. Which may be why the first 100 days is always considered key to any Presidential agenda.

Although graphs may be best judged by their ability to convey one thought, I find that I can stare at this one for hours.

Why Do Market Researchers Play the Lottery?

Lottery Balls

The current Mega Millions jackpot is $550 million and climbing. It seems that whenever the lottery jackpot grows this large, it replaces the weather as the small talk conversation topic most meetings and calls start with. “Did you play it yet?” “What will you do if you win?”

I work around a lot of astute market researchers. These are individuals who have advanced statistical training and often advanced degrees. They are the type of people who should be able to see that from an odds-of-winning standpoint, playing the lottery makes no rational sense. Yet, if anything, they seem more likely to play than other people I hang around.


This is because these folks know that when the jackpot reaches a certain level, your expected return can actually exceed your expected investment. The odds of winning the Mega Millions jackpot are 1 in 258,890,850. This means that you have a 0.0000004% chance of winning. Double that if you buy two tickets!

Here are a few things that are more likely to happen to you than winning the jackpot:  getting hit by lightning, bit by a shark, or getting a hole in one. Or, likely all three at the same time.

Because there are lesser prizes other than the jackpot, your odds of having a winning ticket of some sort are actually a little better than 1 in 15. But, many of these prizes just allow you to get your $2 back, and what fun is that?

With odds of winning the jackpot being about 1 in 259 million and the tickets costing $2, if you consider the jackpot only, the lottery will have an actual expected return greater than $2 whenever the jackpot reaches $518 million or more. If you consider the money paid out with lesser prizes, your $2 investment can be expected to yield more than $2 whenever the jackpot reaches $363 million or more.

So, does it make rational sense to play the Mega Millions if the jackpot is >$363 million? Not so fast! There are other factors involved. First, the government will take their share in taxes. How much depends on how you take the payments and where you live. In June 2013, a person Florida claimed the $590 million Power Ball jackpot. Her lump sum payment was $371 million. This is before taxes. After taxes, that will likely be a third less.  Her $590 million jackpot quickly became $247 million. With this math, you keep roughly 42% of the jackpot, so perhaps it is time to discuss the concept of deceptive marketing with the State governments.

Using this 42% discount, the break-even for the $2 investment then becomes $864 million. Which is a figure larger than any previous jackpot, so in practice there has never been a lottery that pays off on average.

Also, you can’t assume you’ll be to sole winner, which tends not to be the case with large jackpots. So many people play these large jackpot lotteries that the odds of having unique numbers declines. Of course, you can do something to help your odds of being the sole winner if you do win. I’d suggest picking number combinations that are as likely as any others to come up victorious, but that most people can’t see ever winning. I pick numbers like 5, 10, 15, 20, 25, and 30. Or, 20, 21, 22, 23, 24 and 25. Want to have fun with the non-statistically inclined? Tell them that the numbers 1, 2, 3, 4, 5, and 6 are just as good as any others to choose.

So, why do market researchers play the lottery? Of course it makes no rational sense to play the lottery, but you do get value beyond the potential of winning. For $2, you get to dream about what you would do with the money if you won. That alone is probably $2 worth of value. Or perhaps you derive some value from providing additional tax revenue to your state government. The money is largely used for education after all!

Visit the Crux Research Website

Enter your email address to follow this blog and receive notifications of new posts by email.