Monday, February 04, 2013

Babe Ruth and Regression to the Mean

Before Babe Ruth, the highest number of home runs hit by any major league player in a season was 16. When Ruth hit 29 home runs in 1919, it would have been quite sensible to predict that he would regress to the mean and hit fewer the next season. But if he had, "regression to the mean" would not have explained how many home runs he hit. That could only be explained by analyzing each at bat that season, and understanding why Ruth was or was not able to drive a ball out of the park during that at bat. (Of course, we could have a more general explanation as well, e.g., "Ruth was ill," or "Ruth was going blind.")

In fact, the next season Ruth hit 54 home runs. Wow, now he really "ought" to have regressed to the mean, and dropped way, way down.

Instead, the following season he hit 59 home runs. What was happening was the Ruth was ushering in a new era of home run hitting, such that 16 home runs (the previous all-time high before Ruth) would come to be viewed as a very mediocre season for a slugger. Ruth was the leading edge of a movement in the mean. Nevertheless, regression to the mean still has held true, since something will simply cease to be the mean unless it is regressed to. Regression to the mean is a tautology, not an explanation.

  1. Gene - I have been reading your blog for a long time but never commented....

    I thought you might enjoy this article, hope you can access jstor -

    Why Do Skips Precede Reversals? The Effect of Tessitura on Melodic Structure
    Paul Von Hippel and David Huron
    Music Perception: An Interdisciplinary Journal
    Vol. 18, No. 1 (Fall, 2000), pp. 59-85

    Huron has a fascinating book on the subject, but I do have a little problem with the explanation that 400 years of melodic practice largely comes down to regression towards the mean!


