Tuesday, July 04, 2017

A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom


  1. I just bought book that recently, and will start it next week.

    I recommend Statistics Done Wrong, and Willful Ignorance, since you seem interested in these things.

    1. I've read the first. Just ordered the second.



