This Week's Massive Internet Outage Was Caused by a Typo
Believe it or not, the internet is a physical place, or, more accurately, a network of interconnected places we refer to as servers. When they go out of commission, websites go out of commission. That's exactly what happened Tuesday when several of your favorite websites -- including Slack, Medium, Trello, Quora, and Thrillist itself -- went down for at least a portion of the 4-hour outage.
Does it surprise you to learn that such a disaster could happen because of a typo?
That's what was revealed in an explanatory note today from Amazon Web Services -- which runs the Simple Storage Service (S3), the system of servers that were knocked out. It was human error, plain and simple. "Unfortunately, one of the inputs to the command was entered incorrectly," the note reads, and a relatively routine server adjustment related to the S3 billing process turned into a royal clusterfuck.
You may have noticed issues uploading files etc. Don’t worry, so have we: And we’re working as hard as we can to get things back to normal.— Slack (@SlackHQ) February 28, 2017
The servers taken offline were based in northern Virginia, so sites with servers based there were among those that went down. Amazon is taking a number of steps to ensure this doesn't happen again, from auditing their systems and workflows to changing the tool used in the initial error to, in the company's words, "prevent an incorrect input from triggering a similar event in the future."
Incidentally, one company whose business didn't grind to a halt? Amazon itself. Though the company had to contend with a major tech compromise on one arm of its business, its core retail business on Amazon.com was fine, while 54 of the other top 100 retailers on the Internet -- sites Amazon might call competitors -- were crippled.
"We want to apologize for the impact this event caused for our customers," the company's letter concluded. "We know how critical this service is to our customers, their applications and end users, and their businesses."
There's some reason to sympathize with Amazon -- or more at least, the poor chuckleheads who were probably fired over this. When I make sutpid typoz onf blag pots, it's bad, sure. It's not cool. But no one's livelihood for a day dramatically suffers over it. Entering command lines and code of any kind, especially at this level, can involve meticulous attention to detail as well as trial and error. This is like accidentally busting the HTML website you were working on for your intro-to-coding class the night before handing it in, but with millions of dollars in server space on the table. It's huge. Upwards of three to four trillion pieces of data for almost 150,000 sites are stored on the S3 servers, by one analyst's estimate.
Hopefully someone bought those coders a beer.
H/T: The Verge