I'm Regressing to Being Mean,


Mean to people who don't understand statistics but blab on about it all the time. For instance, Steve Sailer apparently does not comprehend "regression to the mean," and treats it as a cause of future events, rather than a tautology:

"Still, Hillary is not a good candidate. Regression to the mean suggests she probably won’t have too many days worse than her Labor Day, but Hillary is clearly Trump’s best hope of being elected."

So, once again: "regression to the mean" is a tautology. Tautologies can be useful, but they do not cause events in the real world. The truth of the statement "All bachelors are unmarried" does not mean that it is unlikely that John, a bachelor, will not get married next year! (It may be unlikely or not: the point is that this tautology has nothing to do with determining that likelihood.)

Real world events "regress to the mean" because, if they don't, what was once the mean will cease to be the mean.

If this is confusing for you, study this list. Between 1901 and 1919, the average number of home runs hit by the AL home run leader was close to 10. Then, in 1919, Babe Ruth hit 29 home runs!

If we believe, as Sailer apparently does, that "regression to the mean" causes things, Ruth should have regressed like a mad man the next year: his 1919 total was 190% above the mean, and 80% above the previous record of 16 home runs in a season.

Of course, the following year, Ruth hit 54 home runs (440% above the old mean!), followed by 59 the year after, and 60 a few years later (now 500% above the pre-1919 mean). In fact, except during World War II, when many of the best players were fighting instead of playing ball, the AL champion has never hit as few as 29 home runs again. And today, hitting 29 home runs in a season, which, remember, in 1919 was an extraordinary "outlier," is considered a nice, but not outstanding, home run total. (The last six years, the AL champions have all hit over 40 home runs.)

What this all amounts to is that the old mean of around 10 is long gone: something will remain the mean only if subsequent data happens to regress to it. If not, we get a new mean.

So Clinton's bad performance over Labor Day weekend may be a fluke... in which case we will "regress to the mean," and she will do better from now until November. Or, perhaps, as some observers suspect, Clinton is quite ill. In that case, her performance on Labor Day weekend might have been the best she will do for the rest of the race, as her health continues to decline. Invoking the pagan god "regression to the mean" does nothing to help us understand what will happen the next few weeks.

4 comments:

  1. But Sailer does not say that regression to the mean causes anything. He says that it suggests what will probably happen. "suggests" wouldn't normally be taken to imply causation.

    ReplyDelete
    Replies
    1. I see the word suggests too! If I say, "The heavy rains that arrive in the country in October suggest there will be flooding then," aren't I saying they will *cause* the flooding? If I say, "Trump's tendency to go off script suggests he will say something else outrageous before too long," aren't I saying that will *cause* him to say it?

      Delete
    2. Seems to me that "suggests" is usually an epistemic, basically "I know Y because X", which is certainly compatible with causation. But I could say something like "the pattern of blood splatter suggests that the victim was shot at close range" and nobody would think that I'm saying the blood splatters caused the victim to be shot.

      Actually, I think "suggests" is a little looser than what I said above, because it could easily mean "I would think Y based on X, looking just at that (except that I have this additional evidence).

      Merriam-Webster's definition looks compatible: http://www.merriam-webster.com/dictionary/suggest says "to show that (something) is likely or true : to indicate (something) usually without showing it in a direct or certain way".

      Delete
  2. Why are you linking to this deplorable person, or that deplorable site?

    Sailer's use of statistics is not very sophisticated, but he's not like one of those sportswriter idiots you like to tease. He would not quibble with your description of "regression to the mean" being a tautology, he's just observed that many people fail to grasp this useful concept when prognosticating.

    ReplyDelete