Cookout at Lookout - Testing Chef

29 April 2012

*Editor’s Note:* Jim Hopp will be presenting a session on testing Chef at #ChefConf on May 16th

At Lookout we’re deploying Chef to manage our infrastructure. We’ve made several decisions about how we’ll use Chef:

Chef server, not chef-solo

You can run chef in two modes: chef-solo or with a chef server. Chef-solo is simply your cookbooks plus the chef-client software; it configures the machine it runs on with no need to contact any other machine. The beauty of chef-solo is its simplicty: you can tar up your cookbooks, download them to a machine, install chef and run it, and you’re good to go. The downside is that everything the cookbooks need must be in the tarball or available on the machine; there’s no access to a central repository or directory service. Using chef server requires more up-front effort; you have to set up a chef server and upload your cookbooks to the chef server; to configure a client machine you install chef, configure it to point at the chef server, and the client downloads the cookbooks from the chef. The power of chef server, though, is that cookbooks can use search. Want to configure your Nagios server using chef server? You can generate your host and service definitions by using search to iterate all of your hosts. You can’t do that with chef-solo. For us, using chef-server was a no-brainer.

Each cookbook in a separate repo

Chef’s original structure had a single repo that contained all of the cookbooks:


This made workflow simple: each developer could group their changes for some spiffy new feature that affected several cookbooks into one or a few commits. But, it was messy to use upstream cookbooks (you’re not really going to write your own apache2 cookbook, you’re going to grab the community’s cookbook).

Since we’re using a mix of upstream cookbooks and our own, we decided to put each cookbook in its own repo. Our initial implementation used git submodule. That turned out to be untenable in practice: git submodules are best suited for modules that don’t change very often, and we’re revving our cookbooks constantly. We got rid of the submodules and wrote Rake tasks to do the grunt work of creating cookbooks, keeping your local copy up-to-date, etc. The jury’s still out on whether our current approach will give us a (mostly) painless workflow, but I think we’re close.

Development in VMs

Since you need a chef server and a chef client to develop and test, we created Vagrant-based tooling to make our workflow simple. It basically makes it dead simple to make a change to a cookbook, upload it to your chef server VM, and test it on your chef client. We’re building something similar for AWS and possibly Openstack using fog.

Unit- and Integration-testing

A dirty little secret of the Ops world is that we’ve been slow to adopt test-driven development. At Lookout, we wanted to start our chef work with TDD, so we insist that every cookbook have tests. That said, one of the main challenges of testing provisioning tools like Chef is that provisioning nodes can be time-consuming.

We started with chefspec, which enables us to write RSpec tests for our chef cookbooks that don’t require provisioning (or, in Chef parlance, converging) a node. Chefspec can’t test everything (it’s really more for unit-testing) but it’s great for confirming that a gem gets installed under the proper circumstances or that a config file is generated properly. You can see some simple examples of how to use chefspec in this GitHub repo.

Chefspec covers the unit testing portion, for the integration testing side of things we’re just starting to use Minitest Chef Handler.

Minitest-chef-handler enables you to run Minitest tests at the end of a chef-client run to confirm that the node is correctly configured. Right now we only run the Minitest tests in test environments, but we’re thinking about running them in production to ensure each node is correctly configured.

Continuous Integration

We’re also big fans of CI. We’ve structured our chef workflow so that every cookbook commit submitted for review to our Gerrit instance triggers a build in Jenkins that runs the chefspec tests for that cookbook. Changes to the chef-repo (adding a cookbook, updating a role) trigger a build of the chef-server and chef-client; we also build the full stack nightly. We have separate jobs for integration tests on specific system types (e.g., our Mongo instances). (I’m presenting a session at #ChefConf on our approach to testing.)

Chef is a great tool for infrastructure management, and incorporating techniques like CI have made it easy to fit in into our development and deployment workflow.

- Jim Hopp