Tuesday, July 04, 2017

A Big Data Problem

"Suppose we are constructing a prediction of some measured response in terms of 20 characteristics...  a common event in machine learning. How large is 20-dimensional space? If we divide each predictor's range into quartiles, the 20-dimensional space is divided into 420 different sections. If you have a billion individual cases, on average there will be only one case every thousand sections. Hardly an empirical base to build upon with confidence!" -- Stephen M. Stigler, The Seven Pillars of Statistical Wisdom

2 comments:

  1. I just bought book that recently, and will start it next week.

    I recommend Statistics Done Wrong, and Willful Ignorance, since you seem interested in these things.

    ReplyDelete
    Replies
    1. I've read the first. Just ordered the second.

      Thanks!

      Delete

Zeno for the computer age

If you wish to better understand Zeno's worry about the continuum, you could do worse than to consider loops in software. Case 1: You...