Some claim it is.

### Read into Things

A few weeks ago I walk into a coffee shop. I have a book in hand, and as I lean in to look at the menu, I place my book on the counter. The barista observes innocently, "Hey! Another customer came in with a book earlier. Is there a book sale going on around here or something?"

### What Is a Planet?

Fights over the best definition of a term are often a quagmire: there is no "correct" or "incorrect" definition in the same sense that there is a correct answer to what 2 + 2 equals. Instead, definitions are either more or less

*useful*. If someone tries to define "animal" as "any entity in the physical universe," that definition is not wrong in the same sense the answering "5" to the 2 + 2 problem is wrong. The right attack on that definition is to point out that it renders the word "animal" less useful than does the currently prevailing definition.

"Common usage" is

*one factor*in deciding how we should define a term. All other things being equal, we should defer to common usage. But common usage is not a trump card that defeats all other considerations.

For instance, when Copernicus forwarded his heliocentric model of the solar system, he was, among other things, offering a new definition of "planet." For many centuries before him, "planet" meant "a celestial entity that wanders among the fixed stars." The planets, under that definition, were the Sun, the Moon, Mercury, Venus, Mars, Jupiter, and Saturn. And please note: so long as we accept that definition of "planet," that list

*is correct*. (Yes, it is incomplete, missing other "planets" that would only be discovered with telescopes.)

Copernicus's system changed that definition to "major celestial objects orbiting the sun." At the time he did this, his new definition certainly violated common usage! But it would not have been a cogent complaint about his work to say, "But Nicolaus, 'a wanderer amongst the fixed stars' is THE definition of a planet!"

### The Real Meaning of "Due to Chance"

Sometimes, people have become so enamored with statistical methods they have hypostatized the terms used in such analysis, and have taken to treating ideas like "chance" or "regression to the mean" as if they could be the

The analysis of probability distributions arose largely in the context of dealing with errors in scientific measurements. Ten astronomers all measured the position of Mercury in the sky at a certain time on a particular evening, and got ten different results. What should we make of this mess?

It was a true breakthrough to analyze such errors as though they were results in a game of chance, and to realize that averaging all the measurements was a better way to approach the true figure than was asking "Which measurement was the most reliable?"

This breakthrough involved regarding the measurement error in a population of measurements as being randomly distributed around the true value that a perfect measurement would have reported. The errors were "due to chance." And also, we could perform a statistical test to see which deviations from the perfect measurement were most likely

The phrase "due to chance" is just fine in the context of this statistical analysis: it means something like "We don't detect any causal factor so dominant in what we are analyzing that we should single it out as

In the context of measurement error, the fact that Johnson's measurement differed from Smith's, and from Davidson's, was caused by Smith's shaky hands, and Johnson having a smudge on his glasses, and the wind being high at the place Davidson was working, and Smith having slightly mis-calibrated his measuring device, and Johnson being distracted by a phone call, and Davidson misreading his device, and... so on and so on. So long as lots of causal factors influence each measurement, and none of them dominate the outcome of the measurement, we can treat their interplay

*actual causes*of events in the real world.The analysis of probability distributions arose largely in the context of dealing with errors in scientific measurements. Ten astronomers all measured the position of Mercury in the sky at a certain time on a particular evening, and got ten different results. What should we make of this mess?

It was a true breakthrough to analyze such errors as though they were results in a game of chance, and to realize that averaging all the measurements was a better way to approach the true figure than was asking "Which measurement was the most reliable?"

This breakthrough involved regarding the measurement error in a population of measurements as being randomly distributed around the true value that a perfect measurement would have reported. The errors were "due to chance." And also, we could perform a statistical test to see which deviations from the perfect measurement were most likely

*not*due to chance, and perhaps were the result of something like a deliberate attempt to fix the outcome of a test.The phrase "due to chance" is just fine in the context of this statistical analysis: it means something like "We don't detect any causal factor so dominant in what we are analyzing that we should single it out as

*the*cause of what occurred." But what it does*not*mean is that a causal agent called "chance" produced the result! No, it means that a large number of causal factors were at work, and that there is no way our test can isolate one in particular as "causing" the outcome.In the context of measurement error, the fact that Johnson's measurement differed from Smith's, and from Davidson's, was caused by Smith's shaky hands, and Johnson having a smudge on his glasses, and the wind being high at the place Davidson was working, and Smith having slightly mis-calibrated his measuring device, and Johnson being distracted by a phone call, and Davidson misreading his device, and... so on and so on. So long as lots of causal factors influence each measurement, and none of them dominate the outcome of the measurement, we can treat their interplay

*as if*some factor called "chance" were at play: but there is no such actual factor!### A Fixed Roulette Wheel

In the comment section of this post, Bob Murphy asks how I would respond to a paper beginning:

"Abstract: It is well-known that players at the craps table are said to have a 'hot hand' after several advantageous rolls. The rollers themselves often report subjectively feeling 'in the zone' during streaks of successful rolls. However, using both Monte Carlo simulations and Bayesian inference models, we conclude that such 'patterns' are illusory and provide no operationally useful betting opportunity."

The idea is sound, but I think the point Bob wants made can be illustrated even better with an example from

The author tells the story of George, a bright inventor who has figured out how to hack a casino's roulette wheel so that it produces a winning number he wants on command. So he could, say, produce one hundred 26s in a row, and clean up by continually betting on 26. But George is a lot smarter than that: he has seen the movies where people are beat up in the back room of the casino for doing that sort of thing. What he does instead is to grab a random number generator app for his phone, and have it randomly pick a number between 0 and 37 (with 37 representing 00), and then cause that number to "hit" on the wheel. (And of course he has several different accomplices win, rather than winning himself, and only on a few spins an evening.)

Clearly, this is no longer a "fair" roulette wheel, at least for George and his friends or for the casino. (It still is fair for the other players! Their chance of winning is unchanged by George's scheme.) On whatever occasions George decides to use his device, the outcome it is not due to "chance,"* but is being deliberately selected.

But no statistical test applied to the pattern of winning numbers will detect anything but chance at work. If Gilovich, Tversky and Vallone used the method of their famous hot hand paper on this wheel, they would have to conclude that George's idea that he could beat the wheel was just an illusion! (Of course, if researchers had

The point of the story is that there can be real causal factors at play in a situation that will not be revealed by the obvious statistical tests. A statistical test that concludes "No significant effect was found" should be

* A side note: "chance" is not properly speaking the cause of anything. At the quantum level, as Ken pointed out, we perhaps find truly random events. But that is just to say that it is possible that, for instance, an excited electron dropping back to a lower atomic orbital is a

"Abstract: It is well-known that players at the craps table are said to have a 'hot hand' after several advantageous rolls. The rollers themselves often report subjectively feeling 'in the zone' during streaks of successful rolls. However, using both Monte Carlo simulations and Bayesian inference models, we conclude that such 'patterns' are illusory and provide no operationally useful betting opportunity."

The idea is sound, but I think the point Bob wants made can be illustrated even better with an example from

*Willful Ignorance*, a book which Ken B. recommended to me, but now seems to be willfully ignoring! (Sorry, Ken, I could not resist that joke.)The author tells the story of George, a bright inventor who has figured out how to hack a casino's roulette wheel so that it produces a winning number he wants on command. So he could, say, produce one hundred 26s in a row, and clean up by continually betting on 26. But George is a lot smarter than that: he has seen the movies where people are beat up in the back room of the casino for doing that sort of thing. What he does instead is to grab a random number generator app for his phone, and have it randomly pick a number between 0 and 37 (with 37 representing 00), and then cause that number to "hit" on the wheel. (And of course he has several different accomplices win, rather than winning himself, and only on a few spins an evening.)

Clearly, this is no longer a "fair" roulette wheel, at least for George and his friends or for the casino. (It still is fair for the other players! Their chance of winning is unchanged by George's scheme.) On whatever occasions George decides to use his device, the outcome it is not due to "chance,"* but is being deliberately selected.

But no statistical test applied to the pattern of winning numbers will detect anything but chance at work. If Gilovich, Tversky and Vallone used the method of their famous hot hand paper on this wheel, they would have to conclude that George's idea that he could beat the wheel was just an illusion! (Of course, if researchers had

*more knowledge*, specifically, the knowledge of who George's accomplices were, they could detect the scheme by analyzing those players' winning percentages.)The point of the story is that there can be real causal factors at play in a situation that will not be revealed by the obvious statistical tests. A statistical test that concludes "No significant effect was found" should be

*a piece of evidence*in the trial of a hypothesis, and not the*verdict*of the trial.* A side note: "chance" is not properly speaking the cause of anything. At the quantum level, as Ken pointed out, we perhaps find truly random events. But that is just to say that it is possible that, for instance, an excited electron dropping back to a lower atomic orbital is a

*causeless*event. It does not mean some pagan god called "Chance" made the electron shift orbits. And at the macro level, "chance" is just the name we give to a situation in which a myriad of causal factors are at play, and it is beyond our ken (b.) to sort them all out.### A problem with Computer Science education, at present

The approach of giving students "little" problems, and rewarding students who are able to "solve" the problem as rapidly as possible with a high grade, teaches an "anti-pattern": hack your way as fast as possible to any program that can solve the problem you have been assigned.

A skilled software engineer does not approach a "customer" (which customer might actually be his boss, or a marketing executive, etc.) request in that way at all: instead, given X has been requested by "the customer," a skilled software engineer

A skilled software engineer does not approach a "customer" (which customer might actually be his boss, or a marketing executive, etc.) request in that way at all: instead, given X has been requested by "the customer," a skilled software engineer

*resists*fulfilling the request as fast as possible, and instead begins to think:- Is it really necessary to program
*anything at all*to fulfill this request? Perhaps some existing capability in the system actually already satisfies the customer request, if only the customer is educated on how to properly use that capability. - Is the request so hard to fulfill, and its fulfillment of such marginal value, that the customer should just be advised, "You don't really want us to program this: it will cost too much."
- Is the request one that can be met by simply installing some third-party library or a commercially available application? If so, it would be wasteful for the developer to write a program to fulfill it.
- If it turns out that, after considering all the above points, there really is some in-house programming necessary to satisfy the customer request:
- Are there likely to be similar requests in the pipeline, so that it will be useful to program a generic capability rather than simply one that fulfills the current request?
- How can the code to fulfill this request be made an integral part of a
*coherent software system*, rather than simply being an isolated chunk of code?

The "solve this isolated problem as fast as possible to receive an A" method of giving CS students "actual" work to do does not teaching them anything at all about how to address the real-world software engineering questions listed above.

Given the semester-oriented nature of modern university education, I don't think there

Given the semester-oriented nature of modern university education, I don't think there

*is*an easy solution to this problem. But at least keeping the above points in students' minds, even if we have to assign "mini-problems," might help.### No, I Don't Believe Probability Judgments Are "Subjective"

Tom was, I think, worried that this is what I was suggesting. Then he got what my claim is. But in case others misapprehend it...

1) There are no judgments whatsoever that are "purely subjective." Any judgment is an attempt to assert something about the world. Although Oakeshott's arguments on this point (in

2) As such, there are better and worse judgments about what the probability of some event is. If all I know is, "Tom is flipping a fair coin," then the

3) But that perfectly correct probability judgment, given my state of ignorance about the flipping, will become decidedly mistaken should my knowledge of what is going on change: for instance, suppose I suddenly gain the superpower of instantaneously being able to assess all the forces acting on a coin at the moment it is flipped so as to "see" whether any particular flip will come up heads or tails. If I gain that superpower, my correct assignment of probability to "The coin will come up heads" is either zero or one, depending on what I "see."

4) And finally, even if I have that superpower, should the casino in which I am betting become suspicious, and only allow me to bet on coin flips from another room (so that I can't gauge the forces at play in the flip), my

So, the

1) There are no judgments whatsoever that are "purely subjective." Any judgment is an attempt to assert something about the world. Although Oakeshott's arguments on this point (in

*Experience and Its Modes*, chiefly) are more robust, I think M. Polanyi's arguments in*Personal Knowledge*are still very good but also more accessible. If I claim that "The odds of that coin coming up hands are one in two," I am saying something about the world "out there," rather than commenting upon some "purely personal" state of my own.2) As such, there are better and worse judgments about what the probability of some event is. If all I know is, "Tom is flipping a fair coin," then the

*correct*probability to assign to "The coin will come up heads" is .50. One way to defend my claim here is to note that anyone else having only the same knowledge as me about the situation can assuredly win money from me in the long run if I choose any other probability while they choose .50.3) But that perfectly correct probability judgment, given my state of ignorance about the flipping, will become decidedly mistaken should my knowledge of what is going on change: for instance, suppose I suddenly gain the superpower of instantaneously being able to assess all the forces acting on a coin at the moment it is flipped so as to "see" whether any particular flip will come up heads or tails. If I gain that superpower, my correct assignment of probability to "The coin will come up heads" is either zero or one, depending on what I "see."

4) And finally, even if I have that superpower, should the casino in which I am betting become suspicious, and only allow me to bet on coin flips from another room (so that I can't gauge the forces at play in the flip), my

*correct*probability judgment*reverts*to .50.So, the

*objectively correct*judgment of the probability of some event occurring depends on how much knowledge we have when making that judgment: if all we know is that Joe is a 50-year-old American male, we might be*correct*in judging that the probability he will live to 80 is .50. (I just picked .50 as a plausible number: I'm not looking this up in the mortality tables at the moment!) But if we then learn he is planning on committing suicide tonight, we would be*correct*in revising our estimate to, "Well, his probability of living to 80 is pretty close to 0."### Hot Streak Length

The critics of this model claimed "It implies a streak length of one."

Well, it doesn't:

And the output is:

Macintosh:statistics gcallah$ ./hotstreak.py

Shooting with hot streaks:

OOXXOXXOXXOXXXOOXXXOXXXXOOXXXXXXXOOOOOXOXXOXXOOXO

Shooting without hot streaks:

OOXXOOXOXXOXOXXXXXXOXOOOXOXOXOOOOOOOOOXXOXOXOOOOO

What the model

Well, it doesn't:

```
```

import random

SHOTS = 50

in_streak = False

hot_streaks = 0

hot_total = 0

print("Shooting with hot streaks:")

for shot in range(1, SHOTS):

hot = (random.random() < .5)

if hot:

hot_total += 1

if not in_streak:

in_streak = True

hot_streaks += 1

make = (random.random() < .66)

else:

in_streak = False

make = (random.random() < .33)

mark = 'X' if make else 'O'

print(mark, end='')

print("")

print("Average hot streak length = " + str(hot_total / hot_streaks))

print("Shooting without hot streaks:")

for shot in range(1, SHOTS):

make = (random.random() < .5)

mark = 'X' if make else 'O'

print(mark, end='')

print("")

And the output is:

Macintosh:statistics gcallah$ ./hotstreak.py

Shooting with hot streaks:

OOXXOXXOXXOXXXOOXXXOXXXXOOXXXXXXXOOOOOXOXXOXXOOXO

**Average hot streak length = 2.0**Shooting without hot streaks:

OOXXOOXOXXOXOXXXXXXOXOOOXOXOXOOOOOOOOOXXOXOXOOOOO

What the model

*actually*codes, and was meant to code, was the possibility that a player could be genuinely "hot" for some period, but if the hot streak*might end at any moment*, then the streak has no predictive value, and "feeding the hot hand" will not help a team.### The Internet Is a Wonderous Place!

I have programed for 30 years now. I have published dozens of articles in professional software engineering journals. I have written programs used to trade tens of millions of dollars of securities each day. I teach computer science.

And today Ken B. informed me that if I set a random variable once outside of a loop the result will be different than if I set it anew each time around the loop!

And today Ken B. informed me that if I set a random variable once outside of a loop the result will be different than if I set it anew each time around the loop!

### Great Minds Think Alike...

"probability is indeed a degree of certainty..." -- Jacob Bernoulli

"It is most certain, given the position, velocity, and distance of a die from the gaming table at the moment when it leaves the hand of the thrower, that the die cannot fall other than the way it actually does fall... Yes it is customary to count the fall of the die... as contingent. The only reason for this is that those things which... are given in nature, are not yet sufficiently known to us." -- Jacob Bernoulli

"

"It is most certain, given the position, velocity, and distance of a die from the gaming table at the moment when it leaves the hand of the thrower, that the die cannot fall other than the way it actually does fall... Yes it is customary to count the fall of the die... as contingent. The only reason for this is that those things which... are given in nature, are not yet sufficiently known to us." -- Jacob Bernoulli

"

*Probability*, in its mathematical acceptation has reference to the state of our knowledge of the circumstances under which an event may happen or fail. With the degree of information which we possess concerning the circumstances of an event, the reason that we have to think that it will occur, or, to use a single term, our*expectation*of it, will vary." -- George Boole### Probability is about our knowledge...

and

A couple members of the commentariat I have complained that in this model, it is necessary to have "inside knowledge" to beat someone who thinks the odds are 50-50 on any given shot. Now, I don't care whether you want to call what "Gene" knows in that model "inside knowledge" or not. Either way, that is missing the more important point: "the odds" change with our knowledge of a situation.

To illustrate: imagine I ask you to predict the odds that an American, male, 40-year-old will live to be 78? Well, if that is all the information you have, you should answer "Even odds." (I looked that up, but from here on out my odds are all just plausible-sounding guesses.)

But now I tell you, "Oh, and he's a heavy smoker."

Oops, better revise that forecast: say, 2-1 against.

But then I add, "And so were all of his deceased male relatives that we can identify, and they all lived to be at least 90."

Aargh, now the odds are 2-1 in favor.

However, I finally add "By the way, he has terminal pancreatic cancer, and the doctors only give him a month to live."

Now you had better revise your odds to 1000-1 against.

Supposing that my guesses after the first odds I gave are accurate, your answer each step of the way was "correct,"

This applies even to something as seemingly straightforward as a claim that, in a flip of a fair coin, the odds are 50-50 of getting heads. If we could somehow see all the forces at work in a particular flip, we would be able to state with certainty, "This toss is going to be heads (or tails)." And, in fact, it turns out that with practice, a person can learn to flip a coin so that it almost always comes up in its original orientation, or vice versa. If all we know is that we have "a person" flipping a fair coin, it is correct to say the odds are 50-50 for getting heads. But if we learned we were dealing with one of these skilled coin flippers, and we had a reason to think he was trying to produce heads, we would instead be correct to say that the coin would come up heads with near certainty.

An application: the above considerations are why a simple mastery of the odds of drawing various card hands are not enough to make one a top poker player. The top players have of course internalized that knowledge, but they have gone much further: they have learned to read the "tells" of less skilled players, so that they can see from the reaction of an amateur whether he has just completed his full house or not. Once they can do that, the formal odds of his having drawn the card he needed become irrelevant: they

*not*a fixed feature of the world "out there."A couple members of the commentariat I have complained that in this model, it is necessary to have "inside knowledge" to beat someone who thinks the odds are 50-50 on any given shot. Now, I don't care whether you want to call what "Gene" knows in that model "inside knowledge" or not. Either way, that is missing the more important point: "the odds" change with our knowledge of a situation.

To illustrate: imagine I ask you to predict the odds that an American, male, 40-year-old will live to be 78? Well, if that is all the information you have, you should answer "Even odds." (I looked that up, but from here on out my odds are all just plausible-sounding guesses.)

But now I tell you, "Oh, and he's a heavy smoker."

Oops, better revise that forecast: say, 2-1 against.

But then I add, "And so were all of his deceased male relatives that we can identify, and they all lived to be at least 90."

Aargh, now the odds are 2-1 in favor.

However, I finally add "By the way, he has terminal pancreatic cancer, and the doctors only give him a month to live."

Now you had better revise your odds to 1000-1 against.

Supposing that my guesses after the first odds I gave are accurate, your answer each step of the way was "correct,"

*given the knowledge you had at hand*. When we know more about a situation the odds change. And it doesn't matter at all whether this is "inside knowledge" or not.This applies even to something as seemingly straightforward as a claim that, in a flip of a fair coin, the odds are 50-50 of getting heads. If we could somehow see all the forces at work in a particular flip, we would be able to state with certainty, "This toss is going to be heads (or tails)." And, in fact, it turns out that with practice, a person can learn to flip a coin so that it almost always comes up in its original orientation, or vice versa. If all we know is that we have "a person" flipping a fair coin, it is correct to say the odds are 50-50 for getting heads. But if we learned we were dealing with one of these skilled coin flippers, and we had a reason to think he was trying to produce heads, we would instead be correct to say that the coin would come up heads with near certainty.

An application: the above considerations are why a simple mastery of the odds of drawing various card hands are not enough to make one a top poker player. The top players have of course internalized that knowledge, but they have gone much further: they have learned to read the "tells" of less skilled players, so that they can see from the reaction of an amateur whether he has just completed his full house or not. Once they can do that, the formal odds of his having drawn the card he needed become irrelevant: they

*know*whether or not he got it. This is not "inside knowledge": the tell was right out in the open, for anyone to see. But only someone practiced at looking for it will recognize it as information to be used in betting.
Subscribe to:
Posts (Atom)