Amazon Web Services’ cloud platform in 'fat finger' outage
Amazon Web Services had a major five-hour outage this week that affected thousands of corporate customers
The problems started with the US-East-1 region, hosted in data centres in Northern Virginia. The incident took out thirty-three of AWS’s services including nine services which failed completely: Athena, EMR, Inspector, Kinesis Firehose, Simple Email Service, S3, WorkMail, Auto Scaling and CloudFormation.
As a result of this many hundreds of thousands of cloud-based applications and websites were forced offline.
Amazon has made a statement claiming that an incorrectly typed command during a routine debugging of its billing system caused the outage which they said lasted
AWS claims that a command meant to remove a small number of servers for one of its S3 subsystems was entered incorrectly and a much bigger tranche of servers was removed. This required a full restart of all affected servers which took longer than expected.
Amazon says it is making changes to its system to make sure more ‘fat finger’ mistakes cannot happen.
The incident, however, has caused chaos for users and leaves many questions to be answered in the new world of mass
In an early statement during the outage, AWS said: “We have identified the issue as high error rates with S3 in US-EAST-1, which is also impacting applications and services dependent on S3. We are actively working on remediating the issue.”
The outage was allegedly caused by the Simple Storage Service (S3), a component of the AWS platform used by many of their cloud-based products.
S3 uses the AWS infrastructure to host its global website network. It stores and retrieves customers’ cloud data.
Affected organisations included Business Insider, Expedia, Coursera, Quora, and Slack.