The Antifragile Chaos Monkey
I just read about how great Amazon US-EAST crash of April 21, 2011 brought down most of their customers who depended on that zone, including big one's like Reddit and Quora. But Netflix remained up. How did that happen?
It turns out that Netflix had made themselves "antifragile" by employing a tool they called "Chaos Monkey." What Chaos Monkey would do was to simply regularly and randomly "crash" various Netflix servers. ("Crash" is in quotes because when it is being done on purpose by the machine owner, it is not clear whether it really should be called a crash or not.)
By continually crashing their own servers, the Netflix engineers could keep on learning how to keep uncrashed portions of their network up and running in the face of part of the network going down. And so when Amazon US-EAST crashed, Netflix ran on, unfazed.
This is what Nassim Taleb is talking about when he says a person or organization that tries to keep all fluctuations damped down becomes fragile, and very vulnerable to a big fluctuation. The companies that tried to keep all of their servers up and running all the time went completely out of operation when Amazon crashed from under them. But the company that kept itself ready with lots of little crashes could handle the big crash.
So:
It turns out that Netflix had made themselves "antifragile" by employing a tool they called "Chaos Monkey." What Chaos Monkey would do was to simply regularly and randomly "crash" various Netflix servers. ("Crash" is in quotes because when it is being done on purpose by the machine owner, it is not clear whether it really should be called a crash or not.)
By continually crashing their own servers, the Netflix engineers could keep on learning how to keep uncrashed portions of their network up and running in the face of part of the network going down. And so when Amazon US-EAST crashed, Netflix ran on, unfazed.
This is what Nassim Taleb is talking about when he says a person or organization that tries to keep all fluctuations damped down becomes fragile, and very vulnerable to a big fluctuation. The companies that tried to keep all of their servers up and running all the time went completely out of operation when Amazon crashed from under them. But the company that kept itself ready with lots of little crashes could handle the big crash.
So:
- If you run on a treadmill, the first time you step unevenly on a pothole, you tear a ligament.
But, if you run on uneven surfaces, your ligaments are stronger, and you can handle the new stress. - If you try to keep from ever feeling down with anti-depressants, the first time you get really walloped by a crisis, you commit suicide.
But, if you learn to deal with lots of smaller ups-and-downs, you are more prepared for a big one. - If you are a central bank that tries to keep growth constant and prevent all downturns, you set the economy up for a big crash.
But, if you accept lots of small downturns, you clear out bad investments in small doses and may avoid big crashes.
Taleb is a cross between a genius and a madman. Anyway, Fooled By Randomness is my favorite book I read in the past 10 years.
ReplyDeleteI don't agree with everything he says, but after spending time with him regularly for the last 9 months, I can't really see him as a "madman": in person, he's really quite calm.
Delete