
Amazon Web Services (AWS) says it has resolved a massive outage that disrupted access to some of the world’s largest websites and apps for much of Monday.
More than 1,000 online platforms, including Snapchat, Lloyds Bank, Halifax, and Reddit, were affected by problems traced to the core of Amazon’s cloud computing operations in the US.
Downdetector, a website outage tracker, said global user reports of technical issues soared to more than 11 million during the disruption.
At around 23:00 BST, Amazon confirmed that all AWS services had “returned to normal operations” after throttling parts of its system to address the issue.
“What this episode has highlighted is just how interdependent our infrastructure is,” said Professor Alan Woodward of the University of Surrey. “So many online services rely upon third parties for their physical infrastructure, and this shows that problems can occur in even the largest of those third-party providers. Small errors, often human made, can have widespread and significant impact.”
The outage began at about 07:00 BST, hitting a wide range of websites and apps, from online games like Fortnite to the language-learning platform Duolingo. Downdetector reported more than four million problem reports within hours, more than double the typical number for a regular weekday.
Mike Chapple, an information technology professor at Notre Dame University, compared the restoration efforts to a power outage. “It’s like when you have a large-scale power outage. Crews start working to try to bring it back on line,” he said. “The power might flicker a few times,” adding that Amazon may have initially “only addressed the symptoms” and not the cause.
Amazon has not yet released a full explanation of the outage, but said the issue “appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1.” DNS, or Domain Name System, functions like a phone book for the internet by translating website names into computer-readable numbers.
Matthew Prince, chief executive of Cloudflare, said the incident underscored the risks of cloud dependency. “Everyone has a bad day, today Amazon had a bad day,” he told the BBC. “There are amazing things about the cloud, it allows you to scale… but if you have an outage like this it can take down a lot of services we rely on.”
Cori Crider, head of the Future of Technology Institute, described the disruption as “a bit like a bridge collapsing.” She said, “An essential part of the economy has fallen to pieces,” calling the reliance on Amazon, Microsoft, and Google, who control around 70 percent of cloud computing “unsustainable.”
“Once you have a concentrated supply in a handful of monopoly providers, when something like this falls over, it takes a huge percentage of the economy out with it,” Crider warned. “We should really look at trying to buy more local services, rather than relying on a handful of American monopoly platforms. That’s a risk to our security, our sovereignty and our economy and we need to look at structural separations to make our markets more resilient to these kind of shocks.”
Ken Birman, a computer science professor at Cornell University, said companies using AWS must also bear some responsibility. “Companies using Amazon haven’t been taking enough adequate care to build protection systems into their applications,” he said.
He added that developers should invest more in backing up critical systems: “We know how to make these systems stronger, and we know how to do it securely.”
The fallout could still have legal ramifications. After a similar outage last year involving CrowdStrike, Delta Airlines has been in court seeking to recover over $500 million in losses. Even after the issue was fixed, the airline said it had to manually reset 40,000 servers, causing major flight delays for several days.
Faridah Abdulkadiri