Posts

Showing posts with the label statistics

More on alpha levels

The α = .05 cutoff for "significant" results is a case of spurious "objectivity" trumping scientific judgment. The fact that scientists have an "objective" standard to adhere to gives the appearance of being more rigorous. But consider another objective way of deciding between the "null hypothesis" and the hypothesis being tested: flip a coin. Heads, we reject the null hypothesis, tails we don't. Completely objective! We could videotape the coin flip, and all sane observers could agree as to whether we got heads or tails. Next, think about the following two cases: We do a study and find that reckless driving correlates with early death with p = .08 (greater than α). We are told to accept the null hypothesis: there is no significant correlation. We do a study and find that sunspot activity correlates with American League victories in the World Series with p = .04 (less than α). We are told to reject the null hypothesis: there is a sig...

Why So Many Statistical Studies Are Worthless

Image
The findings of statistical studies are usually considered "significant" when there is smaller than 5% probability that their findings were the result of mere chance in the selection of a sample to study. Keep that in mind, and let's first just consider sociologists: the American Sociological Association claims 21,000 members in its various sub-groups. Let us guess (the exact numbers don't matter for my point) that each member undertakes two statistical studies per year, and half of those show a significant correlation. That means that by chance alone, this group will produce over a thousand studies per year which appear to show a significant correlation between different phenomena, but in which the significance was really only the result of the luck of the draw in picking a sample to examine. Next let us turn our attention to the bias that exists in academic journals towards results that are  positive (no one cares much about studies that show no connectio...

I'm Regressing to Being Mean,

Image
Mean to people who don't understand statistics but blab on about it all the time. For instance, Steve Sailer apparently does not comprehend "regression to the mean," and treats it as a cause of future events, rather than a tautology: "Still, Hillary is not a good candidate. Regression to the mean suggests she probably won’t have too many days worse than her Labor Day, but Hillary is clearly Trump’s best hope of being elected." So, once again: "regression to the mean" is a tautology . Tautologies can be useful, but they do not cause events in the real world. The truth of the statement "All bachelors are unmarried" does not mean that it is unlikely that John, a bachelor, will not get married next year! (It may be unlikely or not: the point is that this tautology has nothing to do with determining that likelihood.) Real world events "regress to the mean" because, if they don't, what was once the mean will cease to ...

The margin of error of the margin of error

I know I have talked about this before, but I am getting ready to teach statistics in the fall, so a good chunk of my summer will be spent thinking about the subject. And thus my thoughts turn back to how poll results are reported. We see a new poll come out of Florida that shows Smith leading Jones, 52.9% to 47.1%. If the "margin of error" in the poll is ± 3%, someone is surely going to tell us that the race is a "statistical dead heat." But the "margin of error" means no such thing: it tells us that if we have taken a truly random sample of the population, and see Smith polling at 52.9%, then in 95% of such cases, Smith's true percentage is between 55.9% and 49.9%, and likewise Jones's is between 50.1% and 44.1%. So in reality, Smith could be up by over 11%, and it is far, far more likely that Smith is up than it is that Jones is up, given this poll result. If, at the next week, polling shows Smith at 53.1%, and Jones at 46.9%, we are likely...

Stapidity

We need to have a name for the pairing, in current sports reporting, of absolute worship of "statistics" combined with absolute ignorance about how to do probabilistic reasoning. "Stapidity"? For instance, the NBA draft lottery is a domain in which we can be sure pure probabilistic reasoning applies, since it is deliberately set up that way. If a team has a 42.3% chance of getting the top pick, that's that: there is no point looking at "recent history" to see how teams in that position did, since in a random sampling, we expect to see subsets with different distributions of results than we will get as our sample size approaches infinity. And we know with certainty (unless we suspect the NBA has a broken random number generator) that in the limit, 42.3% of such teams will wind up with the top pick. And yet : "And as the fine folks at ESPN Stats & Information pointed out, recent history says not to be too confident the Lakers will keep ...

Nate Silver, Mistaken on Overfitting

Here : "In layman’s terms, an overfit statistical model is one that is engineered to match idiosyncratic circumstances in past data, but which is not an accurate picture and makes poor predictions as a result." No, the problem is that it is too accurate a picture (of the past)! Instead, it is a more abstract but less accurate picture of the past which is more likely to look like the future, since it is only in certain abstract aspects that the past and future are likely to resemble each other. (and the art here is to find just which abstractions to use!)

My Favorite Paper Title of the Past Year

"False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant" (Hat tip to Mark Liberman .)

You Don't Have to Be a Prostitute

Image
We report... and then we decide: "Author Jodi Dixon, a final-year medical student at the University of Birmingham, U.K., describes a 2010 study of 315 students at London University in which 1 in 10 reported knowing a fellow student who had turned to prostitution out of financial necessity." This is supposed to indicate a big problem,  showing how common prostitution is amongst medical students. But wait a sec... the study says 1 in 10 students know someone who has turned to prostitution. And this is from a study at a single university, which surveyed only 315 students. So 1 in 10 of them would be 32 students. At this point, it might occur to one that 32 students is not an unduly large circle of friends for a single person to have. So it's quite possible that these 32 students all know the exact same person who is working the streets. And that possibility looms even more distinctly when one considers how, in a small community like a medical school, word that ...