Tuesday, July 04, 2017

A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom

2 comments:

  1. I just bought book that recently, and will start it next week.

    I recommend Statistics Done Wrong, and Willful Ignorance, since you seem interested in these things.

    ReplyDelete
    Replies
    1. I've read the first. Just ordered the second.

      Thanks!

      Delete

Distraction Deterrents in Small Contexts

"distracted from distraction by distraction" - T.S. Eliot I've been reading a little on how Facebook and other social netwo...