Amazon Initiates 90-Day Reset After Software Errors Disrupt Millions of Orders

Amazon Initiates 90-Day Reset After Software Errors Disrupt Millions of Orders

Amazon has faced significant disruptions in its e-commerce operations due to software errors over the past few weeks. The issues arose partly from its AI coding assistant, Q. According to Dave Treadwell, Amazon’s Senior Vice President of e-commerce services, the company has identified a troubling trend of incidents since the third quarter of 2025.

Recent Disruptions and Their Impact

During this period, Amazon experienced multiple major incidents, including disruptions that resulted in substantial order losses. For instance, on March 2, incorrect delivery times caused a staggering 120,000 lost orders and approximately 1.6 million errors on the website. A few days later, on March 5, another outage led to a 99% decrease in orders across North America, equating to around 6.3 million lost transactions.

  • March 2 Incident: 120,000 lost orders, 1.6 million errors.
  • March 5 Incident: 99% drop in orders, 6.3 million lost orders.

Identifying the Problems

The internal documents revealed that these issues were partly attributable to a lack of stringent controls in the software update processes. Some changes were executed without proper authorization or review, resulting in significant operational disruptions. Treadwell emphasized that conventional software review processes struggled to manage the volume of new code generated by AI tools.

The integration of AI services, such as Q and Kiro, has dramatically increased code production. However, this surge necessitates rigorous testing to identify bugs and issues before deployment.

Amazon’s Response: A 90-Day Safety Reset

In light of these challenges, Amazon plans to implement a 90-day safety reset. This initiative aims to enhance the approval process for code changes involving approximately 335 critical systems that have previously been affected. Under the new guidelines, engineers will need mandatory peer reviews before making any alterations to the code.

  • Engineers must obtain approvals from two reviewers.
  • Utilization of an internal documenting tool is required.
  • Changes must comply with strict reliability engineering protocols.

Benefits of the New Framework

The newly introduced safeguards are designed to introduce “controlled friction” into the code-change process, ultimately aiming to minimize the risk of future disruptions. Treadwell noted that the combination of AI-driven tools with deterministic systems will address critical operational challenges faced by the company.

As a result of this initiative, Amazon aims to improve overall reliability while ensuring a more robust response to software errors in its operations.

Conclusion

Amazon’s recent outages underline the challenges that generative AI poses on software development and deployment. With the rollout of a revised safety protocol, the company is taking proactive measures to enhance its operational integrity. As these changes take effect, Amazon is committed to ensuring a more reliable e-commerce platform for its customers.