Regression to the mean
Sports fans in particular mis-use the idea of "regression to the mean," speaking of it, as though it were some causal power, impacting real, physical events. This is mistaken; let's look at why.
If I have a fair coin, and I have flipped 20 heads in a row, that does not mean that I am "due" for a tail. My odds of getting a tail on the next flip are still one in two. Furthermore, my odds of flipping 20 heads in a row are no different after just having done so then they are at any other time. They are very high against doing so (around one in a million), but your past series of flips does not affect them at all. Thus, if you just flipped 20 heads in a row, your chances of now reaching a streak of 40 are the same as your chances were for getting the original 20.
So even in games of pure chance, regression to the mean is not some strange physical force that controls future flips of the coin. All it says, in simplified terms, is that you're not likely to get that many heads again over the next 20 flips. That's because you were very unlikely to get them for the first 20: it actually just states the independence of these two random sequences.
But when you are dealing with events that are not pure chance, invoking regression to the mean as meaning that the past sequence of events somehow moves the future sequence back towards the average fares even worse. Let us consider a basketball player, say "Ball," who is normally a 40% three-point shooter, but in the first half of the current game goes 0 for 6 from three. At this point, we will often hear sports fans say that "Regression to the mean indicates that he will shoot better in the second half," as though the very fact he shot badly in the first half improves his chances of shooting well in the second half. Of course, if we believe his poor shooting in the first half was merely bad luck, we would be wise to bet on him shooting 40% in the second half, because that's what he usually shoots. It's not that his bad first half somehow physically demands compensation in the second half, it's just that if he's a 40% three-point shooter, that's because he usually hits 40% of his three-point shots. And so that is what we should bet on.
But since shooting is not pure chance, which is why I can't hit 40% of my three-pointers, then the fact that Ball shot 0% in the first half might actually lead us to think the second half will go poorly as well. For example, if some athlete has a particularly bad stretch of performance on some task, that could be mere chance, or it could be because:
- He's aging, and he can no longer perform as he used to.
- He has a nagging injury that hasn't kept him out of games, but is badly affecting his long-range shooting.
- He has recently been living a libertine lifestyle, staying out all night drinking and drugging.
- He has become depressed, and lost confidence in his ability.
- The other team has assigned a player to defend him who is particularly skilled at stopping his moves.
I'm sure you can think of more examples of these causal factors yourself. What they share in common is that they all suppose a reason that Ball shot badly in the first half, which, if it endures over the second half, might be taken to indicate that he will shoot badly then as well. And we fans would probably not know about some of them, unless we had personal contact with someone close to the player. (Having a great defender on him is an exception: informed fans will know about this.)
So if we see a player shoot badly in the first half, our best bet is that he will do somewhat worse than his average in the second half. If we knew that the first half performance was mere chance, then our best bet is that he will shoot his average in the second half. But if we can't be certain there was no causal factor at play, then the bad first half performance will mean that the second half is likely to be below average as well.
So our "rational" bet would start at 40%, and then subtract some percentage for how likely we thought any causal factor was affecting the player. I won't even attempt to guess the right way of figuring that out, but only know that in most cases the likelihood that some causal factor or other is affecting the player is greater than 0%, so the optimal second half bet is less than 40%.
Comments
Post a Comment