- Jim Hopp
At Lookout we’re deploying Chef to manage our infrastructure. We’ve made several decisions about how we’ll use Chef:
- Chef server, not chef-solo
- Each cookbook in a separate repo
- Cookbook development in VMs
- Unit- and integration-tests for our cookbooks and chef installation
- Continuous integration
Chef server, not chef-solo
You can run chef in two modes: chef-solo or with a chef server. Chef-solo is simply your cookbooks plus the chef-client software; it configures the machine it runs on with no need to contact any other machine. The beauty of chef-solo is its simplicty: you can tar up your cookbooks, download them to a machine, install chef and run it, and you’re good to go. The downside is that everything the cookbooks need must be in the tarball or available on the machine; there’s no access to a central repository or directory service. Using chef server requires more up-front effort; you have to set up a chef server and upload your cookbooks to the chef server; to configure a client machine you install chef, configure it to point at the chef server, and the client downloads the cookbooks from the chef. The power of chef server, though, is that cookbooks can use search. Want to configure your Nagios server using chef server? You can generate your host and service definitions by using search to iterate all of your hosts. You can’t do that with chef-solo. For us, using chef-server was a no-brainer.
Each cookbook in a separate repo
Chef’s original structure had a single repo that contained all of the cookbooks:
chef-repo cookbooks apache2 build-essential ... sudo yum
This made workflow simple: each developer could group their changes for some spiffy new feature that affected several cookbooks into one or a few commits. But, it was messy to use upstream cookbooks (you’re not really going to write your own apache2 cookbook, you’re going to grab the community’s cookbook).
Since we’re using a mix of upstream cookbooks and our own, we decided to put
each cookbook in its own repo. Our initial implementation used
That turned out to be untenable in practice: git submodules are best suited for
modules that don’t change very often, and we’re revving our cookbooks
constantly. We got rid of the submodules and wrote Rake tasks to do the
grunt work of creating cookbooks, keeping your local copy up-to-date, etc. The
jury’s still out on whether our current approach will give us a (mostly)
painless workflow, but I think we’re close.
Development in VMs
Since you need a chef server and a chef client to develop and test, we created Vagrant-based tooling to make our workflow simple. It basically makes it dead simple to make a change to a cookbook, upload it to your chef server VM, and test it on your chef client. We’re building something similar for AWS and possibly Openstack using fog.
Unit- and Integration-testing
A dirty little secret of the Ops world is that we’ve been slow to adopt test-driven development. At Lookout, we wanted to start our chef work with TDD, so we insist that every cookbook have tests. That said, one of the main challenges of testing provisioning tools like Chef is that provisioning nodes can be time-consuming.
We started with chefspec, which enables us to write RSpec tests for our chef cookbooks that don’t require provisioning (or, in Chef parlance, converging) a node. Chefspec can’t test everything (it’s really more for unit-testing) but it’s great for confirming that a gem gets installed under the proper circumstances or that a config file is generated properly. You can see some simple examples of how to use chefspec in this GitHub repo.
Chefspec covers the unit testing portion, for the integration testing side of things we’re just starting to use Minitest Chef Handler.
Minitest-chef-handler enables you to run Minitest tests at the end of a chef-client run to confirm that the node is correctly configured. Right now we only run the Minitest tests in test environments, but we’re thinking about running them in production to ensure each node is correctly configured.
We’re also big fans of CI. We’ve structured our chef workflow so that every cookbook commit submitted for review to our Gerrit instance triggers a build in Jenkins that runs the chefspec tests for that cookbook. Changes to the chef-repo (adding a cookbook, updating a role) trigger a build of the chef-server and chef-client; we also build the full stack nightly. We have separate jobs for integration tests on specific system types (e.g., our Mongo instances). (I’m presenting a session at #ChefConf on our approach to testing.)
Chef is a great tool for infrastructure management, and incorporating techniques like CI have made it easy to fit in into our development and deployment workflow.
- Jim Hopp
At Lookout we find ourselves building more and more APIs and backend services these days. Naturally we would like to be certain that everything will work fine and dandy once it has been deployed. The reality of building out a service-oriented architecture is that you not only have to expect failure to happen, you have to plan and test for it.
As of late I’ve been using a tool called
Foreman for some projects to manage their
own “development stacks.” A single service might be composed of a
redis-server instance, a MySQL database and a Rails or
Managing this with Foreman is easy enough, I would create a
web: ruby app.rb redis: redis-server -c config/redis.conf mysql: ./script/run-mysql-ramdisk
When I run
foreman start, Foreman will manage bringing all of these services
online at once, then when you Ctrl-C
foreman it will bring down all of the
That’s great for simple local development and testing, but what about with integration testing the service?
Meet Test Engineer
The Test Engineer gem builds on top of Foreman and adds some basic testing functionality. Currently it’s only been used with Cucumber but it could easily be incorporated into other acceptance testing set ups.
With Test Engineer you can use your existing
Procfile to start and stop the
entire stack with
If you’re already using Cucumber, this becomes very easy to incorporate into
existing Features with the
@testengineer Feature: Log in to lookout.com In order to find or scream my phone As a registered Lookout user I should be able to log into the user area Scenario: With a valid email and password Given I am a registered user When I log in to Lookout Then I should see my devices listed And I should see my news feed
Test Engineer will bring up the entire stack defined in your
each and every scenario listed, providing a good isolated test environment for
your integration tests.
A note about test isolation: In the example
Procfile above I referenced
redis-server and a magic script to run MySQL on a ramdisk. When doing
integration testing with services like this it is absolutely critical to make
sure that the backing data stores for these services is flushed appropriately
between the scenarios/test cases. In this example, the
should be configured to disable AOF writes and snapshots for Redis, while the
run-mysql-ramdisk should unmount its ramdisk when the process is terminated.
Test Engineer also allows you to arbitrarily turn off services during the
scenario, which allows for some interesting fault tolerance testing. You can
define a simple step which invokes
Given /^the cache server is offline$/ do TestEngineer.stop_process('redis') end
Then in my Cucumber
.feature file I can turn off the redis service
mid-way through the test to verify a fault tolerance condition:
@testengineer Feature: Survive cache service degradation Scenario: Locate my device Given I am a registered user And I have an Android device And the cache server is offline When I locate my device Then my device should attempt to locate
That’s about all there is to integration and fault tolerance testing with Foreman and Test Engineer!
Errata: Test Engineer currently relies on some goofy hacking with some Foreman internals, which is part of the reason it cannot arbitrarily start a process after it has been stopped. I am currently working on making Foreman more easily embeddable with the author David Dollar.