Last Tuesday, a web developer went to work writing code as usual, unaware that they were going to accidentally cause a chain reaction of events that slowed to a crawl Amazon Web Services, the world’s largest provider in the field. This five-hour slow down meant the myriad companies that rely on AWS suffered an estimated $150 million profit loss. (No shame in saying I realized something was wrong when I tried to post a gif my coworkers would undoubtedly find hilarious in Slack.) You and many others might have noticed when confronted with troubles accessing your iCloud account, or maybe while trying to pay a friend back on Venmo. What in and of itself amounted to a tiny error, resulted in a ripple effect that crippled online companies without warning.
Amazon released an official summary of what happened and how it plans to prevent the problem going forward, but it’s possible that an 'every employee' commitment to best coding practices might have prevented this fleeting digital tragedy. As I continue learning about software engineering, I recommend two best practices techniques that should allow others to avoid AWS’ fate – regression testing and release management.
Regression testing is when a developer writes automated tasks to make sure their software runs as intended. These tests confirm that a change in code executed in one area won’t break code in another area. A great way to support effective regression testing is to build applications with a modular approach.
Release management is when a developer structures different stages of the same software so they can safely release code updates. Release management allows a dev to run their code changes in a private test environment before publishing to the public live environment that users can see.
Our ill-fated Amazon coder was trying to fix a bug on the S3 system; instead of taking down a few servers, the ‘fix’ took down way too many, which triggered a ‘full system restart.’ Had Amazon practiced this procedure before running it on the live servers, the error might have been identified and pervasive slowdown avoided. Taking the time to double-check that your code works by writing tests is a smart, and ultimately efficient, way to approach web development.
At Integrity, we practice regression testing in our marketing website projects, we practice good release management by creating three environments for one website. Software environments are copies of the same code that are in different stages of the release process. The first environment runs on the developer’s computer where the coding happens; that’s called the development environment. After the bug is fixed, the code is pushed into the testing, or staging, environment, where only the project team and the client can see the software. In this stage, the client can give us revisions for more updates and eventually approve the code. Then we push tested and approved code to the live, or production, environment. For our Wordpress websites, we use WPEngine, which assists in making easy-to-set up staging sites for our WP projects. A bonus side effect of this type of release process is that if a production environment fails for any reason, we have a backup of the site in the staging environment.
I don’t expect to write perfect code, but luckily, safeguards like regression testing and release management exist to help me identify and fix errors before they affect the success of our projects and the happiness of our clients.
Written while listening to Tame Impala's "New Person, Same Old Mistakes."
Do you want to work with a diligent team of software developers? Drop us a line.
What we really care about related to our team returning to the office.
The rules of branding, advertising and marketing have forever changed.