SRE book notes: The Evolution of Automation at Google
Doing automation thoughtlessly can create as many problems as it solves
These are the notes from Chapter 7: The Evolution of Automation at Google from the book Site Reliability Engineering, How Google Runs Production Systems.
This is a post of a series. The previous post can be seen here:
doing automation thoughtlessly can create as many problems as it solves
It isn’t appropriate to automate every component of every system, and not everyone has the ability or inclination to develop automation at a particular time. Some essential systems started out as quick prototypes, not designed to last or to interface with automation.
Automate Yourself Out of a Job: Automate ALL the Things!
We graduated from optimizing our infrastructure for a lack of failover to embracing the idea that failure is inevitable, and therefore optimizing to recover quickly through automation.
A team not running automation has no incentive to build systems that are easy to automate.
The most functional tools are usually written by those who use them.
shipping and iterating rapidly might allow you to implement functionality faster, yet rarely makes for a resilient system.
A post worth reading, from the Engine Yard blog: Pets vs Cattle