Cryptocracy

A blog

Five Things I Hate About Chef

Seven years ago, brian d foy proposed a question for would-be language advocates: “What do you hate most about your language?”. His point was that if you’re not familiar enough with something to hate parts of it, you don’t know it well enough to be an effective advocate. At the time, it inspired some great posts in a few programming language communities. More recently, I was reminded of the question after reading one too many posts praising some new config management tool that went something like this:

I’ve been using X for two weeks, and it’s awesome!
I used Chef/Puppet before and it took me weeks to understand it.
X was so easy to use, I did [some trivial task] in my first half hour!
I thoroughly recommend it.

Such advocacy is profoundly uninteresting. Beyond the fact that you only get started with a tool once, the consequences of a tool’s design decisions often take time to make themselves apparent. The magic that made your first hour easy as a solo user might make life difficult working in a team; or the elegant simplicity in your pilot project might require a pile of hacks six months later at scale. One person’s minor irritations are another’s dealbreakers, so I love hearing what folks think of their preferred tools after they’ve had to live with them for a while.

I’ve been using Chef for the last few years, and it’s currently my favourite tool, so here’s five things I hate about it:

1. No partial updates to node data in the Chef server

Chef doesn’t provide any way to update a subset of a node’s data – each update replaces the copy on the server, and the last writer wins. This means that if you update the server while a node is in the middle of a Chef run, you may find your changes are reverted when chef-client saves the node at the end of the run.

Related to this, knife node from file should have a big disclaimer attached to it. Some folks want to keep their basic node data in version control, and use that subcommand to upload changes. This erases almost everything about the node from the server, so any searches relying on computed or saved attributes will fail until the next chef-client run saves the full node again.

This is not an easy problem to solve. Fortunately, you’re allowed to hate things that can’t be easily fixed.

2. No standard solution for secure attributes

The standard way of parameterising recipes is attributes. The standard way of securing secrets for use by Chef is encrypted databags. Databags and attributes are entirely separate mechanisms, and different methods are used to load encrypted and regular databags. This means that the only cookbooks that load secrets from encrypted databags only do that (they can implement their own fallback through these mechanisms, but Chef doesn’t offer any help).

But wait, wrapper cookbooks! You could write a wrapper to load your secrets from your encrypted databag, override the appropriate node attributes, then include the original recipe. This works, but now your secrets are recorded in node attributes and will be saved to the Chef server in the clear.

But wait, Chef handlers! You could write a Chef handler to remove the sensitive attributes from the node before it’s saved! Maybe – I forget whether report handlers are actually called before the node is saved at the run – and anyway, recipes can (and do) save the node at any point.

I guess you could monkey-patch node.save (or something), but the point is you’re on your own.

3. Underdevelopment of the Recipe DSL

The Recipe DSL hasn’t changed much in the last couple of years, barring the addition of some helpers for Windows platforms.

The most glaring example is the absence of a helper for Encrypted Data Bags. Here’s how you’d load a regular, unencrypted databag:

1
item = data_bag_item('users', 'zts')

Here’s how you’d load the same databag if it was encrypted with the default secret:

1
item = Chef::EncryptedDataBagItem.load('users', 'zts')

Alright, it’s only a small thing – but it sticks out like a sore thumb, and I know the glaring rubyism rubs some people up the wrong way. Seth Vargo’s excellent Chef Sugar gem provides a load of enhancements to the recipe DSL (including an encrypted_data_bag_item method), many of which should be part of core Chef. Dumping every last idea into the recipe DSL might be worse than never adding anything at all, but the lack of love shown to such a core part of the Chef user experience makes me sad.

4. No local storage

Why would I need local storage? Loads of reasons. A big one is that I do a lot of work with chef-solo, which can’t save any state at all (chef-client -z might fix this for me, but it’s not quite a drop-in replacement).

Still, node-local storage would be useful for chef-client too. One application would be storage of secrets generated by Chef for services it installs and manages (eg, a mysql service). Another would be to enable chef-client to detect when node attributes have been changed on the server – or something like the immutable attributes, Dan Carley prototyped for Puppet some time ago.

5. Two-Phase Runs and the Resource Queue

In principle, a Chef run has two phases. In the first phase, recipes are “compiled”, a process which pushes resources into a queue. In the second phase, the compiled resources are “converged” one by one. As resources are queued in the order they’re declared, the order of actions is easy to predict. This is simple, elegant, and – in practice – a great big lie.

Cracks first start to appear in the two-phase ideal when you want to extend Chef – say, adding resources to manage mysql databases. To manage mysql using Ruby, we’ll want to install the mysql gem. Before we can do that, we will first need to install the mysql libraries and a toolchain to compile the gem. Chef can do all these things, but it won’t do any of them until it gets to the converge phase – and that means, we won’t be able to use our shiny new mysql resource until the next time we run Chef.

Fortunately(?), it’s possible to converge a specific resource during the compile phase using resource.run_action() and that’s the go-to solution to this problem. That breaks the expectation that resources will be converged in the order they’re declared, but at least the syntax makes it obvious – except in the case of chef_gem, which does converges in the compile phase without any visual indication in your recipe.

The upshot is that extending Chef with gems is a monumental pain. Seth Vargo’s post “Using Gems With Chef” post explores some more options.

The waters are further muddied by LWRPs using inline compile mode. This creates additional, disconnected run_contexts which are compiled and converged when the LWRP is converged. This makes LWRPs work better, at the cost of obscuring the Chef resources they used to do their thing – run_context.resource_collection.all_resources won’t contain those nested resources. (I suspect you can write a custom event subscriber to collect this information for yourself.)

Once again, I don’t know what the solution looks like here (though I believe people are talking about doing something for Chef 12).

What about you?

So, that’s five things I hate about Chef. I hope a few of you will be motivated to share five things you hate about your favourite tool. I’m not working in a large team at the moment, so I’ve mostly been thinking about technical niggles – but I’d love to hear perspectives on the human side of tool use, too.

Comments