Tuesday, July 04, 2017

A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom

2 comments:

  1. I just bought book that recently, and will start it next week.

    I recommend Statistics Done Wrong, and Willful Ignorance, since you seem interested in these things.

    ReplyDelete
    Replies
    1. I've read the first. Just ordered the second.

      Thanks!

      Delete

The individual of methodological individualism...

is a modern invention : Prince Modupe of the So-so tribe says that at the turn of the century in Africa, “Any destiny apart from the trib...