A blog

Gonzo: Increasing Agility by Understanding Risk

In my writeup of Scale Summit, I referred to a talk that Simon Croome would be giving at Puppet Camp London, and promised to link to it when it was online. The slides and video are now online, so check out “Increasing Agility by Understanding Risk”.

The back half of the presentation introduces Gonzo (github), a tool Simon wrote to identify and review pending changes in his Puppet manifests, and it’s this that piqued my interest at Scale Summit.

Whether or not they’re using a configuration management tool, “What was changed?” is one of the first questions people ask when something goes wrong. The desire to have prompt answers to that question is one of the drivers behind change management processes, and Chef and Puppet both offering reporting features to help with this. If our only desire was to respond more quickly when things went wrong, that would be enough.

However, “move fast and break things” notwithstanding, most businesses would prefer to avoid incidents caused by planned changes. This is usually the primary motivation for introducing change control processes, and involves (at the very least) identification of the proposed changes, an assessment of the associated risk, and review of these before deciding whether to proceed.

Humans aren’t great at assessing risk, particularly when making an off-the-cuff assessment of something they’re about to do. As sysadmins, we’re often over-confident about changes we’re going to make by hand – after all, we’re smart! If something goes wrong in the middle of the change, we’ll notice it and react with our cat-like reflexes.

When moving to “fly-by-wire system administration” using tools like Chef and Puppet, this attitude often shifts dramatically in the other direction. Misplaced confidence in our own control of the situation is replaced by fear. We admit that we aren’t really certain we know what effect a change will have – and we’re sure the system will inflict any mistake at scale, without a second thought.1

Gonzo is the first tool I’ve seen that helps users of a modern configuration management tool to identify and review pending changes across their entire infrastructure. Even in environments without formal change control processes, this visibility can help to avoid unexpected changes and improve confidence and comfort with the tools being used.

No-op/why-run modes are necessarily imperfect, but they’re not worthless. As a Chef user, Gonzo gives me another reason to be jealous of Puppet and I hope its ideas make their way into the Chef ecosystem before long.

  1. The unthinking application of changes is itself a problem – we need autonomation, not automation. But that’s a subject for another post.