Test-Driven Development for Chef

I gave a presentation on test-driven development at ChefConf today. The slides are embedded below, or you can view them directly on Slideshare.

Video of the presentation


- Jim Hopp

posted in: · · · · · ·



Cookout at Lookout - Testing Chef

*Editor’s Note:* Jim Hopp will be presenting a session on testing Chef at #ChefConf on May 16th


At Lookout we’re deploying Chef to manage our infrastructure. We’ve made several decisions about how we’ll use Chef:

  • Chef server, not chef-solo
  • Each cookbook in a separate repo
  • Cookbook development in VMs
  • Unit- and integration-tests for our cookbooks and chef installation
  • Continuous integration

Chef server, not chef-solo

You can run chef in two modes: chef-solo or with a chef server. Chef-solo is simply your cookbooks plus the chef-client software; it configures the machine it runs on with no need to contact any other machine. The beauty of chef-solo is its simplicty: you can tar up your cookbooks, download them to a machine, install chef and run it, and you’re good to go. The downside is that everything the cookbooks need must be in the tarball or available on the machine; there’s no access to a central repository or directory service. Using chef server requires more up-front effort; you have to set up a chef server and upload your cookbooks to the chef server; to configure a client machine you install chef, configure it to point at the chef server, and the client downloads the cookbooks from the chef. The power of chef server, though, is that cookbooks can use search. Want to configure your Nagios server using chef server? You can generate your host and service definitions by using search to iterate all of your hosts. You can’t do that with chef-solo. For us, using chef-server was a no-brainer.

Each cookbook in a separate repo

Chef’s original structure had a single repo that contained all of the cookbooks:

chef-repo
  cookbooks
    apache2
    build-essential
    ...
    sudo
    yum

This made workflow simple: each developer could group their changes for some spiffy new feature that affected several cookbooks into one or a few commits. But, it was messy to use upstream cookbooks (you’re not really going to write your own apache2 cookbook, you’re going to grab the community’s cookbook).

Since we’re using a mix of upstream cookbooks and our own, we decided to put each cookbook in its own repo. Our initial implementation used git submodule. That turned out to be untenable in practice: git submodules are best suited for modules that don’t change very often, and we’re revving our cookbooks constantly. We got rid of the submodules and wrote Rake tasks to do the grunt work of creating cookbooks, keeping your local copy up-to-date, etc. The jury’s still out on whether our current approach will give us a (mostly) painless workflow, but I think we’re close.

Development in VMs

Since you need a chef server and a chef client to develop and test, we created Vagrant-based tooling to make our workflow simple. It basically makes it dead simple to make a change to a cookbook, upload it to your chef server VM, and test it on your chef client. We’re building something similar for AWS and possibly Openstack using fog.

Unit- and Integration-testing

A dirty little secret of the Ops world is that we’ve been slow to adopt test-driven development. At Lookout, we wanted to start our chef work with TDD, so we insist that every cookbook have tests. That said, one of the main challenges of testing provisioning tools like Chef is that provisioning nodes can be time-consuming.

We started with chefspec, which enables us to write RSpec tests for our chef cookbooks that don’t require provisioning (or, in Chef parlance, converging) a node. Chefspec can’t test everything (it’s really more for unit-testing) but it’s great for confirming that a gem gets installed under the proper circumstances or that a config file is generated properly. You can see some simple examples of how to use chefspec in this GitHub repo.

Chefspec covers the unit testing portion, for the integration testing side of things we’re just starting to use Minitest Chef Handler.

Minitest-chef-handler enables you to run Minitest tests at the end of a chef-client run to confirm that the node is correctly configured. Right now we only run the Minitest tests in test environments, but we’re thinking about running them in production to ensure each node is correctly configured.

Continuous Integration

We’re also big fans of CI. We’ve structured our chef workflow so that every cookbook commit submitted for review to our Gerrit instance triggers a build in Jenkins that runs the chefspec tests for that cookbook. Changes to the chef-repo (adding a cookbook, updating a role) trigger a build of the chef-server and chef-client; we also build the full stack nightly. We have separate jobs for integration tests on specific system types (e.g., our Mongo instances). (I’m presenting a session at #ChefConf on our approach to testing.)

Chef is a great tool for infrastructure management, and incorporating techniques like CI have made it easy to fit in into our development and deployment workflow.

- Jim Hopp

posted in: · · · ·



Integration testing with Foreman

At Lookout we find ourselves building more and more APIs and backend services these days. Naturally we would like to be certain that everything will work fine and dandy once it has been deployed. The reality of building out a service-oriented architecture is that you not only have to expect failure to happen, you have to plan and test for it.

As of late I’ve been using a tool called Foreman for some projects to manage their own “development stacks.” A single service might be composed of a redis-server instance, a MySQL database and a Rails or Sinatra application.

Managing this with Foreman is easy enough, I would create a Procfile with the contents:

web: ruby app.rb
redis: redis-server -c config/redis.conf
mysql: ./script/run-mysql-ramdisk

When I run foreman start, Foreman will manage bringing all of these services online at once, then when you Ctrl-C foreman it will bring down all of the servers appropriately.

That’s great for simple local development and testing, but what about with integration testing the service?

Meet Test Engineer

The Test Engineer gem builds on top of Foreman and adds some basic testing functionality. Currently it’s only been used with Cucumber but it could easily be incorporated into other acceptance testing set ups.

With Test Engineer you can use your existing Procfile to start and stop the entire stack with TestEngineer#start_stack and TestEngineer#stop_stack.

If you’re already using Cucumber, this becomes very easy to incorporate into existing Features with the @testengineer tag:

features/support/env.rb:

require 'testengineer/cucumber'

features/user_login.feature:

@testengineer
Feature: Log in to lookout.com
  In order to find or scream my phone
  As a registered Lookout user
  I should be able to log into the user area

  Scenario: With a valid email and password
    Given I am a registered user
    When I log in to Lookout
    Then I should see my devices listed
    And I should see my news feed

Test Engineer will bring up the entire stack defined in your Procfile for each and every scenario listed, providing a good isolated test environment for your integration tests.


A note about test isolation: In the example Procfile above I referenced both redis-server and a magic script to run MySQL on a ramdisk. When doing integration testing with services like this it is absolutely critical to make sure that the backing data stores for these services is flushed appropriately between the scenarios/test cases. In this example, the config/redis.conf file should be configured to disable AOF writes and snapshots for Redis, while the run-mysql-ramdisk should unmount its ramdisk when the process is terminated.


Chaos Engineer

Test Engineer also allows you to arbitrarily turn off services during the scenario, which allows for some interesting fault tolerance testing. You can define a simple step which invokes TestEngineer#stop_process(name), e.g.:

Given /^the cache server is offline$/ do
  TestEngineer.stop_process('redis')
end

Then in my Cucumber .feature file I can turn off the redis service mid-way through the test to verify a fault tolerance condition:

@testengineer
Feature: Survive cache service degradation

  Scenario: Locate my device
    Given I am a registered user
    And I have an Android device
    And the cache server is offline
    When I locate my device
    Then my device should attempt to locate

That’s about all there is to integration and fault tolerance testing with Foreman and Test Engineer!


Errata: Test Engineer currently relies on some goofy hacking with some Foreman internals, which is part of the reason it cannot arbitrarily start a process after it has been stopped. I am currently working on making Foreman more easily embeddable with the author David Dollar.

- R. Tyler Croy

posted in: · · ·