A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom

2 comments:

  1. I just bought book that recently, and will start it next week.

    I recommend Statistics Done Wrong, and Willful Ignorance, since you seem interested in these things.

    ReplyDelete
    Replies
    1. I've read the first. Just ordered the second.

      Thanks!

      Delete