The margin of error of the margin of error
I know I have talked about this before, but I am getting ready to teach statistics in the fall, so a good chunk of my summer will be spent thinking about the subject. And thus my thoughts turn back to how poll results are reported.
We see a new poll come out of Florida that shows Smith leading Jones, 52.9% to 47.1%. If the "margin of error" in the poll is ± 3%, someone is surely going to tell us that the race is a "statistical dead heat." But the "margin of error" means no such thing: it tells us that if we have taken a truly random sample of the population, and see Smith polling at 52.9%, then in 95% of such cases, Smith's true percentage is between 55.9% and 49.9%, and likewise Jones's is between 50.1% and 44.1%. So in reality, Smith could be up by over 11%, and it is far, far more likely that Smith is up than it is that Jones is up, given this poll result.
If, at the next week, polling shows Smith at 53.1%, and Jones at 46.9%, we are likely to hear that "Smith has taken a decisive lead." In fact, it is quite likely that we are just seeing a random jiggle in the pulling numbers because:
1) In the best case, we are, after all, dealing with a random sample and so we do not expect to exactly hit the true population percentages each poll. That is why we have a confidence interval, after all.
2) There is some incalculable possibility that our sample is not truly random, because we have introduced bias of which we are unaware.
3) Even if we have managed to poll a truly random sample of our population, our good fortune can be negated if there is some systematic bias in who refuses to respond to our questions. Perhaps, for instance, the supporters of Jones are particularly antisocial people, and really hate talking to pollsters.
4) Lastly, people may lie: for instance, the supporters of Smith might be conspiring to make Jones feel overconfident.
So the fact that the polling for some race has inched to one side or the other of what, after all, is an arbitrary confidence interval -- there is no particular reason except custom for us to use 95%; we easily could have chosen 93% or 97% instead -- is of very little significance. But it is reported as though some decisive change in the race has occurred.
We see a new poll come out of Florida that shows Smith leading Jones, 52.9% to 47.1%. If the "margin of error" in the poll is ± 3%, someone is surely going to tell us that the race is a "statistical dead heat." But the "margin of error" means no such thing: it tells us that if we have taken a truly random sample of the population, and see Smith polling at 52.9%, then in 95% of such cases, Smith's true percentage is between 55.9% and 49.9%, and likewise Jones's is between 50.1% and 44.1%. So in reality, Smith could be up by over 11%, and it is far, far more likely that Smith is up than it is that Jones is up, given this poll result.
If, at the next week, polling shows Smith at 53.1%, and Jones at 46.9%, we are likely to hear that "Smith has taken a decisive lead." In fact, it is quite likely that we are just seeing a random jiggle in the pulling numbers because:
1) In the best case, we are, after all, dealing with a random sample and so we do not expect to exactly hit the true population percentages each poll. That is why we have a confidence interval, after all.
2) There is some incalculable possibility that our sample is not truly random, because we have introduced bias of which we are unaware.
3) Even if we have managed to poll a truly random sample of our population, our good fortune can be negated if there is some systematic bias in who refuses to respond to our questions. Perhaps, for instance, the supporters of Jones are particularly antisocial people, and really hate talking to pollsters.
4) Lastly, people may lie: for instance, the supporters of Smith might be conspiring to make Jones feel overconfident.
So the fact that the polling for some race has inched to one side or the other of what, after all, is an arbitrary confidence interval -- there is no particular reason except custom for us to use 95%; we easily could have chosen 93% or 97% instead -- is of very little significance. But it is reported as though some decisive change in the race has occurred.
Comments
Post a Comment