Discover more from Bit Maybe Wise
SRE book notes: Dealing with Interrupts
Any complex system is as imperfect as its creators
These are the notes from Chapter 29: Dealing with Interrupts from the book Site Reliability Engineering, How Google Runs Production Systems.
This is a post of a series. The previous post can be seen here:
Any complex system is as imperfect as its creators. In managing the operational load created by these systems, remember that its creators are also imperfect machines.
Polarizing time means that when a person comes into work each day, they should know if they’re doing just project work or just interrupts. Polarizing their time in this way means they get to concentrate for longer periods of time on the task at hand. They don’t get stressed out because they’re being roped into tasks that drag them away from the work they’re supposed to be doing.
For any given class of interrupt, if the volume of interrupts is too high for one person, add another person.
A person should never be expected to be on-call and also make progress on projects (or anything else with a high context switching cost).
Sometimes when a person isn’t on interrupts, the team receives an interrupt that the person is uniquely qualified to handle. While ideally this scenario should never happen, it sometimes does. You should work to make such occurrences rare.
Thanks for reading Bit Maybe Wise! Subscribe for free to receive new posts.