Posts

Showing posts with the label statistical reasoning

I'm Regressing to Being Mean,

Image
Mean to people who don't understand statistics but blab on about it all the time. For instance, Steve Sailer apparently does not comprehend "regression to the mean," and treats it as a cause of future events, rather than a tautology: "Still, Hillary is not a good candidate. Regression to the mean suggests she probably won’t have too many days worse than her Labor Day, but Hillary is clearly Trump’s best hope of being elected." So, once again: "regression to the mean" is a tautology . Tautologies can be useful, but they do not cause events in the real world. The truth of the statement "All bachelors are unmarried" does not mean that it is unlikely that John, a bachelor, will not get married next year! (It may be unlikely or not: the point is that this tautology has nothing to do with determining that likelihood.) Real world events "regress to the mean" because, if they don't, what was once the mean will cease to ...

The Law of Large Numbers

John E. Freund ( Introduction to Probability ) has been discussing topics like the "odds" that an airline flight from Chicago to Los Angeles will arrive on time. He says that if 688 of the last 800 flights have been on time, we can say the probability of this flight being on time is .86. Then he asks, "When probabilities are thus estimated, it is only reasonable to ask whether the estimates are any good. The answer, which is 'Yes,' is supported by a remarkable law called the Law of Large Numbers ... Informally, this law can be stated as follows: " If the number of times the situation is repeated becomes larger and larger, the proportion of successes this will tend to come closer and closer to the actual probability of success. " Later on, he states this law formally: " If a random variable has the binomial distribution, the probability is at least 1 - 1 / k 2 that the proportion of successes in n trials will differ from p by less than k *...

Cause

Image
"The question of causation lies outside of statistics." -- Michael Starbird Contra Hume, our idea of causation stems directly from our experience of acting in the world. Look at the monkey above: he is not puzzling over causation. He knows that if he smashes the nut with the rock, he will cause it to yield its delectable innards. The problems of causation dealt with in statistics are problems of detecting causation when the link between cause and effect is obscure. And then statistics can only provide evidence of causation, and never demonstrate it. There has been a strong correlation between which conference wins the Super Bowl and how stocks do that year . But nobody whatsoever believed for a minute that Super Bowl results were causing stock price movements, despite an 80% success rate in prediction. We only feel confident we really have identified "the cause" of something when we can grasp the causal mechanism involved. And causal mechanisms are som...

Statistics about X are not causal factors determining X

Mistaking statistics, which are merely our summaries of goings-on in the world, with causal factors in the world, is a confusion that pops up too regularly. For instance, here is Gregory Clark suffering from it. Mary Morgan understands this point: "such statistical or probabilistic laws can be said to govern the behaviour of our population. We individual people know better -- we know that the births, marriages, and debts are determined by a whole realm of social, economic, medical, physiological, and other laws, which determine whom we fall in love with, whether we have children, why we die, and when any of these happen to us." -- The World in the Model , pp. 336-337 In fact, I think Morgan hasn't gone far enough here: the "laws" she cites are just our names for the regularities produced by concrete causal factors, and don't themselves cause anything. (The proposition in the title of this post admits of exceptions, such as when a statistic about hou...

All Statistical Knowledge Is Built upon Historical Knowledge

Sometimes we encounter the contention that statistical studies in the social sciences are "rigorous," as opposed to the kind of "soft" knowledge we get from "merely" narrative history. This is mistaken in several ways, but probably the most fundamental is that any validity and significance of any statistical study in the social sciences are themselves based upon historical understanding. For instance, if we are studying industrial output in the Soviet Union, we have to know that data on such things was systematically doctored: and knowing that is a matter of historical understanding. Similarly, if we want to correct for this false reporting and try to get at the true figures, we must examine plant records, diaries, post-Soviet interviews, and so on: again, an historical inquiry. "Ah," you ask, "but what about where there wasn't such data distortion?" Well, we can only determine that there wasn't through... historical unders...

Oscar Robertson's Triple-Double Season

Wikipedia reports : "Oscar Robertson is the only player in NBA history to achieve this feat [of averaging a triple-double]. During the 1961–62 season, Robertson averaged 30.8 points, 12.5 rebounds, and 11.4 assists per game." My son said, "But Dad, he only had 181 triple-doubles in his career. Wouldn't this mean 82 of them came that one year?" I said, "No, those were his averages. He had games below those numbers in each category." "OK, but shouldn't that still be about 70?" "Hmm, let's see: if his distribution on each of these is a Bell curve, I'd expect that maybe in 25 or 30 games he'd miss a double in assists, maybe in 20 or so in rebounds, and in a handful in points. So perhaps..." "Dad, right here..." "Wait a second! Hmm, perhaps in 40 games he actually had a triple double." "Dad, right here on the page it gives the actual number: 41!" Ah, the power of statistical ...