Posts Tagged 'Regression toward the mean'

Researchers should be mindful of “regression toward the mean”

There is a concept in statistics known as regression toward the mean that is important for researchers to consider as we look at how the COVID-19 pandemic might change future consumer behavior. This concept is as challenging to understand as it is interesting.

Regression toward the mean implies that an extreme example in a data set tends to be followed by an example that is less extreme and closer to the “average” value of the population. A common example is if two parents that are above average in height have a child, that child is demonstrably more likely to be closer to average height than the “extreme” height of their parents.

This is an important concept to keep in mind in the design of experiments and when analyzing market research data. I did a study once where we interviewed the “best” customers of a quick service restaurant, defined as those that had visited the restaurant 10 or more times in the past month. We gave each of them a coupon and interviewed them a month later to determine the effect of the coupon. We found that they actually went to the restaurant less often the month after receiving the coupon than the month before.

It would have been easy to conclude that the coupon caused customers to visit less frequently and that there was something wrong with it (which is what we initially thought). What really happened was a regression toward the mean. Surveying customers who had visited a large number of times in one month made it likely that these same customers would visit a more “average” amount in a following month whether they had a coupon or not. This was a poor research design because we couldn’t really assess the impact of the coupon which was our goal.

Personally, I’ve always had a hard time understanding and explaining regression toward the mean because the concept seems to be counter to another concept known as “independent trials”. You have a 50% chance of flipping a fair coin and having it come up heads regardless of what has happened in previous flips. You can’t guess whether the roulette wheel will come up red or black based on what has happened in previous spins. So, why would we expect a restaurant’s best customers to visit less in the future?

This happens when we begin with a skewed population. The most frequent customers are not “average” and have room to regress toward the mean in the future. Had we surveyed all customers across the full range of patronage there would be no mean to regress to and we could have done a better job of isolating the effect of the coupon.

Here is another example of regression toward the mean. Suppose the Buffalo Bills quarterback, Josh Allen, has a monster game when they play the New England Patriots. Allen, who has been averaging about 220 yards passing per game in his career goes off and burns the Patriots for 450 yards. After we are done celebrating and breaking tables in western NY, what would be our best prediction for the yards Allen will throw for the second time the Bills play the Patriots?

Well, you could say the best prediction is 450 yards as that is what he did the first time. But, regression toward the mean would imply that he’s more likely to throw close to his historic average of 220 yards the second time around. So, when he throws for 220 yards the second game it is important to not give undue credit to Bill Belichick for figuring out how to stop Allen.

Here is another sports example. I have played (poorly) in a fantasy baseball league for almost 30 years. In 2004, Derek Jeter entered the season as a career .317 hitter. After the first 100 games or so he was hitting under .200. The person in my league that owned him was frustrated so I traded for him. Jeter went on to hit well over .300 the rest of the season. This was predictable because there wasn’t any underlying reason (like injury) for his slump. His underlying average was much better than his current performance and because of the concept of regression toward the mean it was likely he would have a great second half of the season, which he did.

There are interesting HR examples of regression toward the mean. Say you have an employee that does a stellar job on an assignment – over and above what she normally does. You praise her and give her a bonus. Then, you notice that on the next assignment she doesn’t perform on the same level. It would be easy to conclude that the praise and bonus caused the poor performance when in reality her performance was just regressing back toward the mean. I know sales managers who have had this exact problem – they reward their highest performers with elaborate bonuses and trips and then notice that the following year they don’t perform as well. They then conclude that their incentives aren’t working.

The concept is hard at work in other settings. Mutual funds that outperform the market tend to fall back in line the next year. You tend to feel better the day after you go to the doctor. Companies profiled in “Good to Great” tend to have hard times later on.

Regression toward the mean is important to consider when designing sampling plans. If you are sampling an extreme portion of a population it can be a relevant consideration. Sample size is also important. When you have just a few cases of something, mathematically an extreme response can skew your mean.

The issue to be wary of is that when we fail to consider regression toward the mean, we tend to overstate the importance of correlation between two things. We think our mutual fund manager is a genius when he just got lucky, that our coupon isn’t working, or that Josh Allen is becoming the next Drew Brees. All of these could be true, but be careful in how you interpret data that result from extreme or small sample sizes.

How does this relate to COVID? Well, at the moment, I’d say we are still in an “inflated expectations” portion of a hype curve when we think of what permanent changes may take place resulting from the pandemic. There are a lot of examples. We hear that commercial real estate is dead because businesses will keep employees working from home. Higher education will move entirely online. In-person qualitative market research will never happen again. Business travel is gone forever. We will never again work in an office setting. Shaking hands is a thing of the past.

I’m not saying there won’t be a new normal that results from COVID, but if we believe in regression toward the mean and the hype curve we’d predict that the future will look more like the past than how it is currently being portrayed. The post-COVID world will certainly look more like the past than a more extreme version of the present. We will naturally regress back toward the past and not to a more extreme version of current behaviors. The “mean” being regressed to has likely changed, but not as much as the current, extreme situation implies.


Visit the Crux Research Website www.cruxresearch.com

Enter your email address to follow this blog and receive notifications of new posts by email.