What is the source of sportswriters misuse of statistics?

I parodied the kind of thing they write here.

But what is the error in their reasoning? This issue is not merely of academic interest: medical doctors and juries are both prone to commit statistical errors that render their practical judgments fallacious.

Doctors for instance, have overestimated the likelihood that a patient has a disease, given a positive test result, by a factor of 33. And the prosecutor's fallacy can convince juries despite it being a fallacy: "In another scenario, a crime-scene DNA sample is compared against a database of 20,000 men. A match is found, that man is accused and at his trial, it is testified that the probability that two DNA profiles match by chance is only 1 in 10,000. This does not mean the probability that the suspect is innocent is 1 in 10,000. Since 20,000 men were tested, there were 20,000 opportunities to find a match by chance." In fact, in this situation, there is an 86% chance an innocent man will match the DNA profile found at the crime scene.

The error sportswriters most typically make in their probabilistic reasoning is that of treating the distribution of some previous results as a casual force determining future results. We find propositions like, "The Blues face an uphill battle in their playoff series facing the Greens this year, as they lost 90% of their games against the Greens in the regular season."

But this gets things precisely backwards: the reason the Blues face an uphill battle against the Greens is not their record during the year against them: it is the fact that they are not as good a team as the Greens. Evidence for the fact that they are not as good is provided by the fact that they lost 9 out of 10 games against the Greens, but that statistic itself is no barrier to their winning the playoff series: perhaps their two best players were out for the nine games they lost, but present for the one they won, and are now back for the playoffs.

The apogee of this kind of silliness is achieved when some surely irrelevant fact is introduced to generate a statistic that is supposed to "create trouble" for a team: "The Greens have lost four games in a row on Monday nights against teams from the alpha conference, so history is against them tonight." Well, what happened was (perhaps) that they weren't very good for a few years, and during those years, it just happened that their Monday night games were against a pretty good team from the alpha conference. The real "statistic" here is just that "The Greens tend to lose against teams better than they are," which is a simple tautology, since the "better" team is just that team that usually tends to win.

The ability of humans to reason probabilistically is a relatively recent development, and represents a genuine advance in human knowledge. But as with all such advances, it tends to take on a magical aura, so that simply invoking "statistics" can appear to support "reasoning" that has no reasonable basis whatsoever.

Statistic, when used properly, may give us a glimpse into real causal factors at work in our world. But statistics are not themselves causal factors operating in the world: they are snapshots of the effects produced by actual casual factors.

An analogy: We find the mauled body of a hiker in the woods. All around his body are bear footprints. We may reasonably conclude: "The hiker was mauled and killed by a bear." But it would be utter nonsense to conclude "The hiker was killed by all of those bear footprints." Statistics (properly understood) are very much like those bear prints: they are the footprints of an actual causal process at work in the world, but are never the cause of anything.

An exception to the above proposition that is not really an exception: it is possible that some agent, say, a sports team, may itself fall prey to the fallacious use of statistics of which I accuse sportswriters: The Greens may come to think, "Tonight's game is hopeless: we always lose on Monday nights against teams from the alpha conference." But the real causal factor here is not the statistic itself, but the Greens' erroneous belief that it somehow controls the result of tonight's game.

1 comment:

  1. This reasoning was brilliantly parodied by a sci-fi writer, whose name I forget. We get 1024 men to play a coin toss. The 512 winners play, then 256 winners, etc. At the end we have a lucky man. We do the same with women, find the lucky woman, and breed a lucky race.

    ReplyDelete