Sunday, July 30, 2017

Habits versus intelligent practices

"It is of the essence of merely habitual practices that one performance is a replica of its predecessors. It is of the essence of intelligent practices that one performance is modified by its predecessors." -- Gilbert Ryle, The concept of mind, p. 42

Feeling hot, hot, hot

So I think some confusion has been generated in our ongoing discussion of "the hot hand" by the word "streaks." And a good bit of that confusion has been my fault, for including the word "streak" when what I really wanted to talk about was just "hotness" itself -- and here I'm thinking of you, Bob Murphy.

Ha ha, just joking, I swear to you all that I never picture Bob and I showering naked together. Never! For real.

In any case, what I am indicating is the feeling that anyone who has played a sport or music, for any length of time, has had that they "on" at some moments, and not at others. And the TGV authors, besides trying to demonstrate that "hot hands" aren't predictively useful, also imply that the idea that "I am hot right now" is some sort of cognitive illusion.

It seems that TGV may be incorrect on their predictive findings, but that's not what I have been addressing. I am asking "Does a lack of statistical significance show that the 'feeling' of being on is 'an illusion'?"

So chuck aside the "streak" aspect -- and I apologize for the extent to which I created a problem by using that word -- and let's address whether it is possible for a person to be "on" and yet, say, actually miss ten shots in a row. What would this "on-ness" be? I think the clearest way to understand it is as a propensity, along the lines of what Ryle or Popper talk about. When an athlete is on, they have a stronger than usual tendency to perform successfully. But that tendency might be offset by all sorts of other things, and so might not appear in a statistical study, despite it being a real thing.

So, for instance, there is nothing perplexing or idiosyncratic about a baseball player saying, "I was really seeing the ball and hitting it well yesterday, and I would've had three homeruns, but the wind off of the bay kept blowing the ball back into the park, so I wound up 0-4."

Similarly, a basketball player might report, "I was so off yesterday! So off that by sheer luck, I banked in three three-pointers that I didn't even intend to bank."

Or a golfer might note, "I played much better on Saturday than I did on Friday, despite shooting a 68 on Friday and a 73 on Saturday. The wind off of the Irish Sea was so unpredictable that if I hadn't been playing better on Saturday, I would have shot 80."

These are all normal, every day reports one hears athletes really making, and most people, and I think anyone who has played sports extensively, knows just what they are talking about.


Newest Course Offering

Discrete mathematics, now under construction.

Thursday, July 27, 2017

It is an archetypal truth

"that the social structure is corrupt and incomplete." -- Jordan Peterson

Of course, we are obligated at all times to improve the social structure we find ourselves in as much as we can.

But the problem with ideologues is that they think that simply because the current social structure is "corrupt and incomplete," that therefore they are justified in completely demolishing that existing structure.

No, the new structure they establish will also be "corrupt and incomplete," and, per their logic, also require complete destruction.

A "corrupt and incomplete" social structure is always preferable to no social structure.



Misunderstanding narcissism

Many times, people apply the term "narcissist" someone who thinks a lot of themselves. But clinically speaking, that is almost the complete opposite of what the term really means.

Narcissists are, in fact, people who think so little of themselves that all of their actions are directed towards the maintenance of that extremely fragile self-image. So, for instance, if someone tells me Donald Trump is a narcissist, I know they have no idea what they are talking about. Trump may perhaps be an egomaniac, but he is absolutely not a narcissist.

Wednesday, July 26, 2017

Sense and reference

A couple of readers confused about my post on definitions.

If we change the sense of a turn, we may change its reference as well. (Not always: if we change the sense of X from "the evening star" to "the morning star," X still refers to the same thing!)

But we have not changed any of the facts about what X used to refer to. So if we were to change the sense of the term "cat" to "a large, leaping Australian marsupial," it would henceforth refer to what we now call kangaroos. But that does not mean that the non-human mammal currently living in my house will suddenly have a pouch! Similarly, if we define a new mathematical symbolism, call it Mnew , that is the same as ours (which we can call Mold) for the first use of number, but every subsequent time it is mentioned, its value goes up by one, so that in Mnew, 2 + 2 = 5, since the second '2' means what '3' means in Mold. That 2 + 2 = 4 is always true in ordinary arithmetic, whatever symbols we choose to employ for the concepts involved, so, we are saying the same thing when we say "II + II = IV", or "dos más dos es igual a cuatro." But in Mnew we are talking about different concepts when we write 2 + 2 = 5. The fact that in this different language the symbols don't mean the same thing as in Mold, and different propositions turn out to be true, should not be very surprising if properly understood.

Nor does the fact that the definition of a term is merely conventional mean that there is no correct or incorrect applications of the term! Given our current definition of "cat," it is correct call the creature who haunts all our waking hours with its meows a cat, and incorrect to say it is a kangaroo. Thinking again about other languages should make this obvious: it is correct to call our hosted mammal 'cat' if we are speaking English, but not if we are speaking Spanish or Russian! If I say "ore" in Yoruba, I am talking about my friend, but if I then switch to English, and say "Ore is metallic," it does not make it true that my friend is metallic.

Tuesday, July 25, 2017

Statistical analysis of agent-based models

I have observed that, when one writes a paper using one's own agent-based model, it is now common practice to perform statistical analysis of the output of the model.

This is like hiding an Easter egg under a shrub so that your paper can "discover" it there in its conclusion.


Worst use of "methodology", 2017

FBI profiler commenting on a series of murders: "They were all done with the same methodology."



Saturday, July 22, 2017

Read into Things


A few weeks ago I walk into a coffee shop. I have a book in hand, and as I lean in to look at the menu, I place my book on the counter. The barista observes innocently, "Hey! Another customer came in with a book earlier. Is there a book sale going on around here or something?"

Thursday, July 20, 2017

Merry on Rome and America

I don't think I have ever been cited this much in an essay.

What Is a Planet?


Fights over the best definition of a term are often a quagmire: there is no "correct" or "incorrect" definition in the same sense that there is a correct answer to what 2 + 2 equals. Instead, definitions are either more or less useful. If someone tries to define "animal" as "any entity in the physical universe," that definition is not wrong in the same sense the answering "5" to the 2 + 2 problem is wrong. The right attack on that definition is to point out that it renders the word "animal" less useful than does the currently prevailing definition.

"Common usage" is one factor in deciding how we should define a term. All other things being equal, we should defer to common usage. But common usage is not a trump card that defeats all other considerations.

For instance, when Copernicus forwarded his heliocentric model of the solar system, he was, among other things, offering a new definition of "planet." For many centuries before him, "planet" meant "a celestial entity that wanders among the fixed stars." The planets, under that definition, were the Sun, the Moon, Mercury, Venus, Mars, Jupiter, and Saturn. And please note: so long as we accept that definition of "planet," that list is correct. (Yes, it is incomplete, missing other "planets" that would only be discovered with telescopes.)

Copernicus's system changed that definition to "major celestial objects orbiting the sun." At the time he did this, his new definition certainly violated common usage! But it would not have been a cogent complaint about his work to say, "But Nicolaus, 'a wanderer amongst the fixed stars' is THE definition of a planet!"


The Real Meaning of "Due to Chance"

Sometimes, people have become so enamored with statistical methods they have hypostatized the terms used in such analysis, and have taken to treating ideas like "chance" or "regression to the mean" as if they could be the actual causes of events in the real world.

The analysis of probability distributions arose largely in the context of dealing with errors in scientific measurements. Ten astronomers all measured the position of Mercury in the sky at a certain time on a particular evening, and got ten different results. What should we make of this mess?

It was a true breakthrough to analyze such errors as though they were results in a game of chance, and to realize that averaging all the measurements was a better way to approach the true figure than was asking "Which measurement was the most reliable?"

This breakthrough involved regarding the measurement error in a population of measurements as being randomly distributed around the true value that a perfect measurement would have reported. The errors were "due to chance." And also, we could perform a statistical test to see which deviations from the perfect measurement were most likely not due to chance, and perhaps were the result of something like a deliberate attempt to fix the outcome of a test.

The phrase "due to chance" is just fine in the context of this statistical analysis: it means something like "We don't detect any causal factor so dominant in what we are analyzing that we should single it out as the cause of what occurred." But what it does not mean is that a causal agent called "chance" produced the result! No, it means that a large number of causal factors were at work, and that there is no way our test can isolate one in particular as "causing" the outcome.

In the context of measurement error, the fact that Johnson's measurement differed from Smith's, and from Davidson's, was caused by Smith's shaky hands, and Johnson having a smudge on his glasses, and the wind being high at the place Davidson was working, and Smith having slightly mis-calibrated  his measuring device, and Johnson being distracted by a phone call, and Davidson misreading his device, and... so on and so on. So long as lots of causal factors influence each measurement, and none of them dominate the outcome of the measurement, we can treat their interplay as if some factor called "chance" were at play: but there is no such actual factor!




A Fixed Roulette Wheel

In the comment section of this post, Bob Murphy asks how I would respond to a paper beginning:

"Abstract: It is well-known that players at the craps table are said to have a 'hot hand' after several advantageous rolls. The rollers themselves often report subjectively feeling 'in the zone' during streaks of successful rolls. However, using both Monte Carlo simulations and Bayesian inference models, we conclude that such 'patterns' are illusory and provide no operationally useful betting opportunity."

The idea is sound, but I think the point Bob wants made can be illustrated even better with an example from Willful Ignorance, a book which Ken B. recommended to me, but now seems to be willfully ignoring! (Sorry, Ken, I could not resist that joke.)

The author tells the story of George, a bright inventor who has figured out how to hack a casino's roulette wheel so that it produces a winning number he wants on command. So he could, say, produce one hundred 26s in a row, and clean up by continually betting on 26. But George is a lot smarter than that: he has seen the movies where people are beat up in the back room of the casino for doing that sort of thing. What he does instead is to grab a random number generator app for his phone, and have it randomly pick a number between 0 and 37 (with 37 representing 00), and then cause that number to "hit" on the wheel. (And of course he has several different accomplices win, rather than winning himself, and only on a few spins an evening.)

Clearly, this is no longer a "fair" roulette wheel, at least for George and his friends or for the casino. (It still is fair for the other players! Their chance of winning is unchanged by George's scheme.) On whatever occasions George decides to use his device, the outcome it is not due to "chance,"* but is being deliberately selected.

But no statistical test applied to the pattern of winning numbers will detect anything but chance at work. If Gilovich, Tversky and Vallone used the method of their famous hot hand paper on this wheel, they would have to conclude that George's idea that he could beat the wheel was just an illusion! (Of course, if researchers had more knowledge, specifically, the knowledge of who George's accomplices were, they could detect the scheme by analyzing those players' winning percentages.)

The point of the story is that there can be real causal factors at play in a situation that will not be revealed by the obvious statistical tests. A statistical test that concludes "No significant effect was found" should be a piece of evidence in the trial of a hypothesis, and not the verdict of the trial.

* A side note: "chance" is not properly speaking the cause of anything. At the quantum level, as Ken pointed out, we perhaps find truly random events. But that is just to say that it is possible that, for instance, an excited electron dropping back to a lower atomic orbital is a causeless event. It does not mean some pagan god called "Chance" made the electron shift orbits. And at the macro level, "chance" is just the name we give to a situation in which a myriad of causal factors are at play, and it is beyond our ken (b.) to sort them all out.

Friday, July 14, 2017

A problem with Computer Science education, at present

The approach of giving students "little" problems, and rewarding students who are able to "solve" the problem as rapidly as possible with a high grade, teaches an "anti-pattern": hack your way as fast as possible to any program that can solve the problem you have been assigned.

A skilled software engineer does not approach a "customer" (which customer might actually be his boss, or a marketing executive, etc.) request in that way at all: instead, given X has been requested by "the customer," a skilled software engineer resists fulfilling the request as fast as possible, and instead begins to think:
  • Is it really necessary to program anything at all to fulfill this request? Perhaps some existing capability in the system actually already satisfies the customer request, if only the customer is educated on how to properly use that capability.
  • Is the request so hard to fulfill, and its fulfillment of such marginal value, that the customer should just be advised, "You don't really want us to program this: it will cost too much."
  • Is the request one that can be met by simply installing some third-party library or a commercially available application? If so, it would be wasteful for the developer to write a program to fulfill it.
  • If it turns out that, after considering all the above points, there really is some in-house programming necessary to satisfy the customer request:
    • Are there likely to be similar requests in the pipeline, so that it will be useful to program a generic capability rather than simply one that fulfills the current request?
    • How can the code to fulfill this request be made an integral part of a coherent software system, rather than simply being an isolated chunk of code?
The "solve this isolated problem as fast as possible to receive an A" method of giving CS students "actual" work to do does not teaching them anything at all about how to address the real-world software engineering questions listed above.

Given the semester-oriented nature of modern university education, I don't think there is an easy solution to this problem. But at least keeping the above points in students' minds, even if we have to assign "mini-problems," might help.

No, I Don't Believe Probability Judgments Are "Subjective"

Tom was, I think, worried that this is what I was suggesting. Then he got what my claim is. But in case others misapprehend it...

1) There are no judgments whatsoever that are "purely subjective." Any judgment is an attempt to assert something about the world. Although Oakeshott's arguments on this point (in Experience and Its Modes, chiefly) are more robust, I think M. Polanyi's arguments in Personal Knowledge are still very good but also more accessible. If I claim that "The odds of that coin coming up hands are one in two," I am saying something about the world "out there," rather than commenting upon some "purely personal" state of my own.

2) As such, there are better and worse judgments about what the probability of some event is. If all I know is, "Tom is flipping a fair coin," then the correct probability to assign to "The coin will come up heads" is .50. One way to defend my claim here is to note that anyone else having only the same knowledge as me about the situation can assuredly win money from me in the long run if I choose any other probability while they choose .50.

3) But that perfectly correct probability judgment, given my state of ignorance about the flipping, will become decidedly mistaken should my knowledge of what is going on change: for instance, suppose I suddenly gain the superpower of instantaneously being able to assess all the forces acting on a coin at the moment it is flipped so as to "see" whether any particular flip will come up heads or tails. If I gain that superpower, my correct assignment of probability to "The coin will come up heads" is either zero or one, depending on what I "see."

4) And finally, even if I have that superpower, should the casino in which I am betting become suspicious, and only allow me to bet on coin flips from another room (so that I can't gauge the forces at play in the flip), my correct probability judgment reverts to .50.

So, the objectively correct judgment of the probability of some event occurring depends on how much knowledge we have when making that judgment: if all we know is that Joe is a 50-year-old American male, we might be correct in judging that the probability he will live to 80 is .50. (I just picked .50 as a plausible number: I'm not looking this up in the mortality tables at the moment!) But if we then learn he is planning on committing suicide tonight, we would be correct in revising our estimate to, "Well, his probability of living to 80 is pretty close to 0."

Hot Streak Length

The critics of this model claimed "It implies a streak length of one."

Well, it doesn't:

import random

SHOTS = 50
in_streak = False
hot_streaks = 0
hot_total = 0

print("Shooting with hot streaks:")
for shot in range(1, SHOTS):
    hot = (random.random() < .5)
    if hot:
        hot_total += 1
        if not in_streak:
            in_streak = True
            hot_streaks += 1
        make = (random.random() < .66)
    else:
        in_streak = False
        make = (random.random() < .33)
    mark = 'X' if make else 'O'
    print(mark, end='')
print("")
print("Average hot streak length = " + str(hot_total / hot_streaks))

print("Shooting without hot streaks:")
for shot in range(1, SHOTS):
    make = (random.random() < .5)
    mark = 'X' if make else 'O'
    print(mark, end='')
print("")




And the output is:

Macintosh:statistics gcallah$ ./hotstreak.py
Shooting with hot streaks:
OOXXOXXOXXOXXXOOXXXOXXXXOOXXXXXXXOOOOOXOXXOXXOOXO
Average hot streak length = 2.0
Shooting without hot streaks:
OOXXOOXOXXOXOXXXXXXOXOOOXOXOXOOOOOOOOOXXOXOXOOOOO

What the model actually codes, and was meant to code, was the possibility that a player could be genuinely "hot" for some period, but if the hot streak might end at any moment, then the streak has no predictive value, and "feeding the hot hand" will not help a team.

Wednesday, July 12, 2017

The Internet Is a Wonderous Place!

I have programed for 30 years now. I have published dozens of articles in professional software engineering journals. I have written programs used to trade tens of millions of dollars of securities each day. I teach computer science.

And today Ken B. informed me that if I set a random variable once outside of a loop the result will be different than if I set it anew each time around the loop!

Great Minds Think Alike...

"probability is indeed a degree of certainty..." -- Jacob Bernoulli

"It is most certain, given the position, velocity, and distance of a die from the gaming table at the moment when it leaves the hand of the thrower, that the die cannot fall other than the way it actually does fall... Yes it is customary to count the fall of the die... as contingent. The only reason for this is that those things which... are given in nature, are not yet sufficiently known to us." -- Jacob Bernoulli

"Probability, in its mathematical acceptation has reference to the state of our knowledge of the circumstances under which an event may happen or fail. With the degree of information which we possess concerning the circumstances of an event, the reason that we have to think that it will occur, or, to use a single term, our expectation of it, will vary."  -- George Boole

Tuesday, July 11, 2017

Probability is about our knowledge...

and not a fixed feature of the world "out there."

A couple members of the commentariat I have complained that in this model, it is necessary to have "inside knowledge" to beat someone who thinks the odds are 50-50 on any given shot. Now, I don't care whether you want to call what "Gene" knows in that model "inside knowledge" or not. Either way, that is missing the more important point: "the odds" change with our knowledge of a situation.

To illustrate: imagine I ask you to predict the odds that an American, male, 40-year-old will live to be 78? Well, if that is all the information you have, you should answer "Even odds." (I looked that up, but from here on out my odds are all just plausible-sounding guesses.)

But now I tell you, "Oh, and he's a heavy smoker."

Oops, better revise that forecast: say, 2-1 against.

But then I add, "And so were all of his deceased male relatives that we can identify, and they all lived to be at least 90."

Aargh, now the odds are 2-1 in favor.

However, I finally add "By the way, he has terminal pancreatic cancer, and the doctors only give him a month to live."

Now you had better revise your odds to 1000-1 against.

Supposing that my guesses after the first odds I gave are accurate, your answer each step of the way was "correct," given the knowledge you had at hand. When we know more about a situation the odds change. And it doesn't matter at all whether this is "inside knowledge" or not.

This applies even to something as seemingly straightforward as a claim that, in a flip of a fair coin, the odds are 50-50 of getting heads. If we could somehow see all the forces at work in a particular flip, we would be able to state with certainty, "This toss is going to be heads (or tails)." And, in fact, it turns out that with practice, a person can learn to flip a coin so that it almost always comes up in its original orientation, or vice versa. If all we know is that we have "a person" flipping a fair coin, it is correct to say the odds are 50-50 for getting heads. But if we learned we were dealing with one of these skilled coin flippers, and we had a reason to think he was trying to produce heads, we would instead be correct to say that the coin would come up heads with near certainty.

An application: the above considerations are why a simple mastery of the odds of drawing various card hands are not enough to make one a top poker player. The top players have of course internalized that knowledge, but they have gone much further: they have learned to read the "tells" of less skilled players, so that they can see from the reaction of an amateur whether he has just completed his full house or not. Once they can do that, the formal odds of his having drawn the card he needed become irrelevant: they know whether or not he got it. This is not "inside knowledge": the tell was right out in the open, for anyone to see. But only someone practiced at looking for it will recognize it as information to be used in betting.


Monday, July 10, 2017

Not Surprised Rob Got This Wrong, but

et tu, Ken?

Because it is trivial to show that the hot streaks in my first program on this topic are real, and can be bet on successfully by anyone who knows they exist, and it only takes a couple more lines of code:


SHOTS = 100
MAKE_BET = True
MISS_BET = False

gene_stake = 100
kr_stake = 100
gene_bet = MAKE_BET

make = 0.0
print("Betting with hidden hot streak mechanism:")
for shot in range(1, SHOTS):
    hot = (random.random() < .5)
    gene_bet = MAKE_BET if hot else MISS_BET
    if hot:
        make = (random.random() < .66)
    else:
        make = (random.random() < .33)
    if gene_bet == make:
        gene_stake += .97
        kr_stake -= .97
    else:
        gene_stake -= 1.03
        kr_stake += 1.03

print("Gene's final holdings = " + str(gene_stake))
print("KR's final holdings = " + str(kr_stake))


KR, thinking the outcome is 50/50, are willing to "make book" and take bets on either side, so long as they get a house "vigorish" of 3 cents per bet. But Gene can "see" the hot hand taking place, and bets on the hot (and against the cold) hand.

And here is the outcome:
Betting with hidden hot streak mechanism:
Gene's final holdings = 114.03
KR's final holdings = 85.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 128.03
KR's final holdings = 71.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 126.03
KR's final holdings = 73.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 130.03
KR's final holdings = 69.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 134.03
KR's final holdings = 65.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 118.03
KR's final holdings = 81.97
172-16-30-10:statistics gcallah$ ./hotstreak2.pyBetting with hidden hot streak mechanism:
Gene's final holdings = 128.03
KR's final holdings = 71.97
172-16-30-10:statistics gcallah$
KR lose every single time, by a lot!

What's especially weird here is that the "George" whom Rob accused me of maliciously deleting the mention of is an instance of just the sort of thing I programmed here: Weisberg's example has George use a random number generator to pick a number to come up on his rigged roulette wheel. For the person who doesn't know George can rig the wheel, the pattern of numbers that "hits" looks completely fair: there is absolutely no way to tell it from a truly random wheel. But George, having more knowledge of the causal process at play, can win as often as he wants to.

Now, I don't think Rob can really read, so its no surprise that he missed that his own example makes the point I am making. But Ken???

Either The Supreme Court was doing just what I claimed...

Or Clarence Thomas doesn't really know anything about how the Supreme Court works:

"As Justice Clarence Thomas points out in his separate opinion (joined by Justices Samuel Alito and Neil Gorsuch), when the Court reviews a stay, it is essentially assessing whether lower-court rulings will be ultimately reversed on the merits. There would be no reason for the Supreme Court to narrow the lower-court stays of the travel ban if the justices were of a mind to concur in the lower courts’ reasoning."

So Josiah, please take this up with Justice Thomas.

TGV on Hot Hands

Tversky, Gilovich and Vallone wrote a famous paper "debunking" the idea of a "hot hand." When they did so, they conflated two very different questions:

1) Is it sensible to feed the ball to a player with a "hot hand," since he has a greater chance of making his next shot? I.e., is there predictive value in this phenomena?

2) Is the impression that players have that sometimes they are "on" and sometimes not an illusion? I.e., does the phenomena exist at all?

The findings of their paper, if accurate (and recent research suggests they are not), would show that there is no predictive value in hot streaks, whether or not they really exist. But by defining "hot streaks" as simply being this predictive value, the authors, without any basis for doing so, also claimed that players' perception of being "on" at certain times is just an illusion.

So any reader complaining that my recently posted model "does not follow the TGV definition" of a "hot hand" is simply demanding that I make the same mistake that TGV made!

That is ridiculous: My disputing the TGV definition of a hot hand cannot be refuted by insisting I use the TGV definition of a hot hand!

UPDATE: And by the way, in this post, I quite deliberately created a model in which:
1) Hot streaks are statistically undetectable; and
2) Hot streaks offer no predictive leverage for a player's next shot.

So it was somewhat stunning to see criticisms of my model based on the fact that in it, hot streaks are statistically undetectable, and offer no predictive leverage for a player's next shot.

Since 1) and 2) were the whole point of my model!

Saturday, July 08, 2017

A Simple Model of Real But Random-Looking Hot Streaks

This model is not supposed to be realistic!

Suppose:

Before every shot, a player enters either the state "hot streak" or "cold streak" with probability 1/2.

A player on a hot streak has a 2/3 probability of hitting a shot during that streak.

A player on a cold streak has a 1/3 probability of hitting a shot during that streak.

We can program this, and know with certainty that there are periods when the player has a 2/3 chance of making a shot, and periods when he has a 1/3 chance... and yet it does not help us at all in predicting the next shot. (From the outside, not knowing if the streak is "on" or not, there is always a 50% probability that the next shot will go in.)

Here is a Python program implementing this algorithm and also implementing another loop with a simple 50% chance of hitting for comparison:

import random

SHOTS = 50

print("Shooting with hot streaks:")
for shot in range(1, SHOTS):
    hot = (random.random() < .5)
    if hot:
        make = (random.random() < .66)
    else:
        make = (random.random() < .33)
    mark = 'X' if make else 'O'
    print(mark, end='')
print("")

print("Shooting without hot streaks:")
for shot in range(1, SHOTS):
    make = (random.random() < .5)
    mark = 'X' if make else 'O'
    print(mark, end='')
print("")


And here are some runs of the program:


Fisher on Scientific Judgment

"The Natural Sciences can only be successfully conducted by responsible and independent thinkers applying their minds and their imaginations to the detailed interpretation of verifiable observations. The idea that this responsibility can be delegated to a giant computer programmed with Decision Functions belongs to the phantasy of circles rather remote from scientific research." -- Ronald Fisher

Friday, July 07, 2017

Thanking Ken B. for his Willful Ignorance...

recommendation.

Ken recommended the book Willful Ignorance to me. It arrived today; I randomly* opened it up and found a section on "The Ignorance Fallacy." In the section, the author, Herbert Weisberg, discusses the "hot hand fallacy." After a quick review of the evidence, he writes:
So, it appears that streakiness is just a myth. Or is it?

Let us accept for the moment the hypothesis that pure randomness can plausibly explain almost any hot hand streak in sports or games. Does that necessarily imply that such streaks do not really exist? Consider that there are a great many factors, most not measurable, that might influence any individual outcome, such as one particular game or at bat... What the research certainly tells us is that if such factors exist, they must be haphazard enough to appear essentially random.
And this is precisely what I have pointed out a number of times in the past: the findings "debunking" hot hands are all entirely consistent with the actual existence of hot hands. For instance, athletes who are "hot" often report that during their streak, they felt a sense of heightened awareness, and say things like "the baseball appeared as big as a grapefruit to me."

Let us trust these athletes self-reports for a moment, but further posit that such periods of heightened awareness appear and disappear in an unpredictable fashion. Then athletes' reports of having a "hot hand" would be entirely accurate, but also be compatible with the analysis showing that the data matches a random process.

By the way, a great example of something that appears random but is actually deliberately caused is the random number generator in your favorite programming language. It took careful design on the part of many engineers to make seemingly random numbers pop out of your computer!


* Or was I guided? Perhaps by the "hot hand" of... Satan?

The worst IT book EVER!

Man, I feel so cheated. I bought a book by some Polya fellow that claimed it would explain "How to Solve IT."

But I'm 50 pages in, and the guy just can't stop banging on about mathematics; not a peep about IT yet!

"Self-Plagiarism" Versus Good Engineering Sense

I've always had a problem with the notion of "self-plagiarism": I suggest it is just an artifact of IP law, and not, like "other plagiarism," a matter of honesty.

If Joe gave me idea X, and I publish it as my own, I am lying, and failing to give Joe proper credit.

But if Genet gave Genet + 1 idea X, does it really make any sense to say that Genet + 1 is lying in saying that the idea was his?

Well, no, it obviously doesn't. The only purpose of the strictures on "self-plagiarism" is to enrich copyright holders at the expense of an author being able to re-use his own ideas.

And all of my training as a software engineer rebels against this concept: as an SE, you want to re-use code at every chance you can!

UPDATE: A quote on code re-use:
Code reuse

Only suckers start from scratch. In fact, today I took out some code I wrote over the summer, changed five lines, and started it running again. Woo hoo. It was sitting there in a code repository waiting for a chance to live again. Smart developers reuse code as often as they can. That was one of the main goals of the open source movement. It wasn’t freedom; it was laziness. If we reuse our code, we save a gazillion hours of work.

Thursday, July 06, 2017

"Procedure" and "Data Structure" - A Distinction without a Difference


"The inner coming-to-be or genesis of substance is an unbroken transition into outer existence, into being-for-another, and conversely, the genesis of existence is how existence is by itself taken back into essence."
- Hegel, Phenomenology of Spirit (paragraph 42)
What happens in computer programs?

Classically, an algorithm is a well-defined procedure that takes input and returns output. The input is some piece of data. You know, a number; or string. The procedure turns this number into another number; or a string into a number. Or whatever.

Literarily, a crystallized objective object comes in. Upon it acts the action of a thousand rapid hands too quick to see. Finally, fresh out of the fire, the result pops out, separate from its furnace, like a piece of toast from the maw of the toaster.

Briefly, Data. Process. Data.

We didn't come out of the womb knowing this pattern though. We had to be taught. A teacher had to guide our hands, pointing, "Look! Those numbers! That's data. That sentence! It's data too." Then we saw the process of the data's transformation. It changed from this data, to that data. And this change, our teacher called, "Procedure." And we saw the names, that they were very good.

And so we learned a pattern. To call this or that, a "data structure" or a "procedure." We followed our teacher in not calling a number a "procedure." Otherwise, we would really confuse things. And we would get a slap on the wrist ... for being wrong about the pattern we have learned.

Once we recognize that we were taught these things, that we entered into a pattern, we might begin to wonder whether we should take our pattern with the upmost seriousness. Shall we step out of our pattern to see what is real in it and what is mere habit?

And so we invite ourselves to consider our concepts as they are, uncommitted to where our results may lead us (say lead us to a slap on the wrist), and our old rules have left our mind for the moment. Uncommitted inquiry.

In other words, let us enter into the world of philosophy.

Let's look at something very "obviously" a piece of data: the number five. What is it to a computer? ( Really, what is it to a programmer? Computers don't really regard things. )

Five is something such that if it is next to a plus sign and another number, it returns the sum.

We might as well have the following.

def 5 (op, summand):
     if(op == '+'):
           if(summand == 1):
                 return 6
           if(summand == 2):
                 return 7
      .
      .
      .

The same goes for multiplication and other operations that the processor is designed to do when it "sees five."

And this procedural knowledge, "what to do," is the entirety of the number five to a computer program.

The same goes for all numbers. And the same for all data. For a computer has to know what to do with the data. But it needs the data to tell us what precisely this is. And so data itself tells the computer what to do. Thus data is procedure.

On the reverse side, "what to do" must be stored someplace in a computer. That is, a procedure is also a thing within a computer program. A piece of hard data.

Thus data and procedures don't seem all that different. The shocking thing occurs when we consider a description of an algorithm with this insight, "Data is processed into data." The "data" part processes the "is processed into" part. And the result is a process. Movement upon movement! This is enough to make one's head spin. Each part seems the mover and the moved simultaneously. And we realize we can no longer look at the computer program from the world of computer programming. For computer programming has static things which are acted upon.

And we come down from this wild experience, tired; our movement and thought we ease -- we make for ourselves an arrest in our experience. But with wonder at how vast our world is; how limited the world of computer programming is. But we like our old game, and we enter into it again, but with the awareness that procedures and data are merely distinctions without a difference.


Tuesday, July 04, 2017

Statistically Significant Harm Caused by Statistical Significance

I have argued before that the great importance placed on α = .05 in statistical studies is an attempt to replace educated judgment with a technical decision. But that decision itself is arbitrary: there is no particular reason to choose .05 over .04, or .06, or any other number less than .50.

It turns out it is even worse than I thought: an education that focuses on such a cutoff leads "researchers to interpret evidence dichotomously rather than continuously. Consequently, researchers may either disregard evidence that fails to attain statistical significance or undervalue it relative to evidence that attains statistical significance."

Education in statistics, at least as it is too often taught today in schools, makes one worse at likelihood judgments.

A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom

Monday, July 03, 2017

Come again?

Journalists are supposedly taught to keep sentences and paragraphs short, so that their writing is easy to follow. How, then, did the following come about?

"However, Jazz management opted not to risk losing Hill in free agency without a suitable replacement after he declined their attempts to sign him to an extension during the season, trading Oklahoma City's lottery-protected 2018 first-round pick to the Minnesota Timberwolves for Rubio before the July 1 deadline to use salary cap space remaining from last season."

I've read that three times now, and while I understand it involves three teams, two players, and a draft pick, I really don't understand much more than that.

Bleg: Does Skype have the worst user interface ever?

OK, I have "Pending contact request"s on Skype. The obvious thing to do would be to make that message itself a link to the dialogue where you accept the request.

But Skype did not do that.

A second best would be have an option on the "contact" menu called something like "Accept pending requests."

But Skype did not do that.

A third best might be to double click on the contact from whom one has the pending request, and then get a button or something to accept the request.

But Skype did not do that.

I have searched Skype help for "pending" and "accept": nothing. Searching for "request" explains how to send a contact request. I asked both of the people who sent me requests if they know how to accept requests, and neither of them has any clue. I have googled, but every result I get seems to describe some earlier version of Skype because the "Accept" button they talk about is not where they say it should be.

What, exactly, did Skype do to enable the acceptance of pending requests? I have been searching for two days, and have no idea! Whatever they did, they have hidden the feature with the extreme cunning and stealth.

Or, in other words...

Help!

PS: I may have figured out what is happening. It may be the case that I sent these people contact requests (although I don't recall doing so), and they haven't accepted. But this would have been so easy to clarify on Skype's part: "You sent a pending contact request" or "You received a pending contact request."

PPS: That weren't it! I checked, and the gentlemen in question don't have contact requests.

Did you know...

Fortran was updated less than 10 years ago?

Sunday, July 02, 2017

Being a developer

A nice post from my friend Scott Johnson on being a developer.

The identitarians are not always wrong!

I have often asserted here that no ideology could ever gain any traction if it did not contain at least some partial truths. So, for instance, libertarians are certainly correct in asserting that any attempt at economic regulation tends to get captured by special interests.

Similarly, although the racial and sexual "identitarians" often spout nonsense, they are certainly correct in thinking thay mainstream discourse often "priveliges" certain groups.

For instance, Netflix captioning, when a person is speaking a European language, almost always reads, "Speaking Russian," "Speaking Italian," etc.

But when almost any non-European language is being spoken, the captioning reads, "Speaking in native language."

I see similar reports on athletes from Africa: "Olu speaks five African dialects." Because, you see, there is a single language, called "African," and Ga and Ewe and Twi and Fante and Wolof and Swahili and... are all just "dialects" of that language.

By their euphemisms you shall know them

I'm watching the Belgian TV series The Break . Not bad, but... At several points the subject of abortion comes up. The characters say...