Ansible 101

by Matt Cholick

I like Chef. I think it's a reasonable solution to a very real set of problems. I've worked with the tool enough to know how all its pieces fit together: Chef itself, the nodes and environments, Chef-vault, Berkshelf, test-kitchen, and other elements of the ecosystem. I'm confident that I can modify a recipe to suite my needs or spin things up from scratch. I like their overall model, and I like that the tooling supports a test-driven flow for developing cookbooks.

Where I run into trouble with Chef is coupling its high complexity with infrequent use. Complexity by itself isn't necessarily bad: difficult problems can require complex solutions. My trouble is rooted in the fact that I'm a developer, not an operations engineer. I deal with Chef once every month or two. In that time, some piece of the Chef stack has inevitably drifted. Maybe I upgraded Vagrant. Or, more likely, some Ruby gem no longer works. Or I've forgotten some important detail about Berkshelf that's critical to getting a recipe all the way through to production. There's enough to the stack that, without fail, I'm debugging something broken in the tool or process itself rather than the server I'm trying to provision.

I've been on two teams now where a developer, frustrated by Chef, has started playing with Ansible and had nothing but praise for the tool. I finally decided to give Ansible a shot and adapted part of my EC2 vm's Chef recipes.

I decided to write up my experience, as I didn't find any articles covering what I'd call a complete flow: touching everything from laying out a new repository to setting up and running tests against a Vagrant virtual machine. For my sample cookbook, I'm installing a few packages, adding some configuration files, installing the HotSpot JVM from Oracle, and setting the hostname. For the full working example, clone my Github repository.

Ansible's best practices had some advice on directory layout, but it didn't break up the environments cleanly. @geedew's post here has a layout I prefer, as it better separates the environment specific configuration.

.
├── README.md
├── environments
│   ├── dev                     # development environment directory
│   │   ├── group_vars          # group variables for an environment
│   │   ├── host_vars           # host specific variable files
│   │   │   └── site_vm.yml
│   │   └── inventory
│   └── prod                    # production environment directory
│       ├── group_vars
│       ├── host_vars
│       └── inventory
├── roles                       # each subdirectory is a role
│   ├── common
│   │   ├── files
│   │   │   └── default.el      # files for the role
│   │   └── tasks
│   │       └── main.yml        # tasks, a main.yml is required
│   └── java
│       ├── files
│       │   └── install_jdk.sh
│       └── tasks
│           └── main.yml
├── server.yml                   # the master playbook
└── test                         # test directory
    ├── Gemfile
    ├── Rakefile                 # rakefile to run serverspec
    ├── Vagrantfile
    ├── spec
    │   ├── default              # serverspec tests
    │   │   ├── common_spec.rb
    │   │   └── java_spec.rb
    │   └── spec_helper.rb
    └── test.sh                  # test runner script

At the top level is an environments directory, where each subdirectory contains group and host variables and an inventory file. The inventory file describes the hosts to run playbooks against. Below is the dev inventory file. I specify a host, give it an alias, and configure the ssh user/key pair.

192.168.33.100 ansible_ssh_user=vagrant ansible_ssh_private_key_file=~/.vagrant.d/insecure_private_key

My example targets a single server. To test things out, I picked something simple that varied per environment: hostname. The file host_vars/site_vm.yml specifies all the host specific values for site_vm (192.168.33.100).

hostname: cholick.com.dev

Next are the roles. Below is my common role's main.yml. Like Chef, there are facilities for common things like installing packages, copying files, and setting the hostname. The install packages block makes use of Ansible's loops. I found the end result quite readable.

---
    - name: update cache
    apt: update_cache=yes cache_valid_time=3600
    sudo: yes

    - name: install common packages
    apt: pkg={{ item }} state=present
    sudo: yes
    with_items:
      - emacs23-nox
      - htop

    - copy: >
      src=../files/default.el
      dest=/usr/local/share/emacs/site-lisp/default.el
      mode=0644 owner=root group=root
    sudo: yes

    - hostname: name={{ hostname }}
    sudo: yes

The second role installs Java. Unfortunately, Oracle's JVM isn't in the Ubuntu repositories (a licensing thing if I remember correctly), so I scripted this part of the install (which makes for a better spike anyway). I know there are PPA that offer this, but I haven't had good luck in the past with PPA staying current. packagecloud.io could be a solution to setting up my own, but that's for another day. Here is Java's main.yml:

---
    - name: Check java
    shell: java -version || echo "undefined"
    register: java_version
    changed_when: False

    - name: Run install script
    script: ../files/install_jdk.sh
    sudo: yes
    when: "'Java HotSpot' not in java_version.stderr or '1.8' not in java_version.stderr"

The script is slow (downloading, uncompressing, and installing Java), so I protected it with a check to only run if the JVM isn't on the box. Output of the first task feeds into the second. install_jdk.sh is available in the repo for this playbook

Now we come to testing. Here is where I disagree most with the Ansible authors philosophically. Their documentation says:

"[...] it should not be necessary to test that services are running, packages are installed, or other such things [...] so when there is an error creating that user, it will stop the playbook run. You do not have to check up behind it."

Their perspective really misses the point and misses many things that unit tests touch:

  • A role might be perfectly written, but it might not be on the right hosts (or any)
  • Variables consumed by tasks and roles might have the wrong values
  • A valid package is installed, but not the correct wrong one
  • Tests help to describe the intent of the code. A test that checks that emacs is installed isn't necessarily checking up on Ansible, it's explicitly documenting that I expect the machine to have Emacs
  • They're a chance to fail fast, before the overhead of running in staging environments
  • Refactoring: changes to playbooks that successfully run, but no longer do the correct thing
  • TDD: I'm sure anyone reading this already has an opinion about TDD; mine is that it's the Right Thing™ to do

So, Ansible playbook test support isn't as integrated out-of-the-box as I would have liked. When I investigated how Chef did its testing, though, I found that Serverspec does much of what I thought was actually Test Kitchen. Serverspec was also quite simple to setup. After installing the gem, running "serverspec-init" asks a series of questions that generates a test harness.

First in the test stack is a simple Vagrant file, shown below. The file specifies an IP address (matching the dev inventory file) as well as an Ubuntu version.

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
    config.vm.box = "ubuntu/trusty64"
    config.vm.network :private_network, ip: "192.168.33.100"
end

Rakefile, spec_helper.rb, and .rspec in the test tree were generated by Serverspec. The Gemfile, shown below, simply specifies that I want to install Serverspec.

gem 'serverspec'

Below I've included a simple test runner script (based on a post by servercheck.in). It ensures the vm is up and then runs the tests. Optionally, the script will also start from scratch and check for idepmpotence.

#!/bin/bash -e
if [ "$1" == "--full" ]; then
    vagrant destroy --force
fi

vagrant up

ansible-playbook -i ../environments/dev/inventory ../site.yml

if [ "$1" == "--full" ]; then
    ansible-playbook -i hosts ../server.yml \
        | grep -qE "changed=0\s+unreachable=0" \
        && (echo -e "Idempotence test: ${green}pass${clear}" && exit 0) \
        || (echo -e "Idempotence test: ${red}fail${clear}" && exit 1)
fi

rake

Finally, below are a few tests over the "common" role. They check for the existence of a package, ensure that the default emacs config has been copied over, and verify that the hostname is correctly set per the host_vars/site_vm.yml file.

require 'spec_helper'

describe package('emacs23-nox') do
  it { should be_installed }
end

describe file('/usr/local/share/emacs/site-lisp/default.el') do
  it { should be_file }
  it { should contain /backup-by-copying/ }
end

describe command('hostname') do
  its(:stdout) { should match /cholick\.com\.dev/ }
end

I was quite impressed with how quickly I was able to get up and running with Ansible. This simple start to exploring Ansible, though, didn't touch two areas that have caused me headaches while using Chef. I didn't learn how Ansible's manages community playbooks (How are they versioned? What sort of quality are they? Does the Ansible ecosystem have something analogous to Berkshelf?). I also didn't learn how difficult it will be to work on a living cookbook a months from now. I do like enough of what I saw, though, to start using Ansible in personal projects. It's a slick tool.