A few Puppet best practices

Puppet, the popular configuration management tool, can get tricky at times. After a few months of using Puppet to manage our servers at work, a few practices have emerged as good, recommendable practices. I wanted to share a few of them with the rest of the world here so that beginners get a head start but also to get a good base for discussion with more seasoned Puppet users.

1. Version control as much as possible

This one may seem obvious to anyone who has used version control, but it isn’t obvious for everybody. Many sysadmins who start to use Puppet have had limited exposure to version control, which they often consider as a tool reserved for developers alike.

Using version control will open up a lot of additional possibilities with Puppet, such as better tracking of changes, testing your Puppet manifests in an isolated environment, promoting your configuration from environment to environment, etc. Version control even provides a free backup for your configuration code.

You will see gains from using any version control system (VCS), but modern distributed VCS systems (such as Git, Mercurial or Bazaar) prove to be particularly useful here due to the ease they provide in managing multiple branches of code.

Using a code collaboration tool such as Github, Bitbucket or Gitlab (self-hosted and open-source, highly recommended) will also allow you and your team to review each other’s changes before they are applied. I won’t try to convince anyone of the virtues of code reviews here, but let’s just say you’ll end up with much better, more maintainable Puppet code if you consistently review your changes with your peers.

Put all of your Puppet files (manifests, templates, files, hieradata files) under version control, then checkout a working copy on your Puppetmaster. When ready to “deploy” changes to your Puppermaster, just sync the working copy on the server with the code in the version control repository.

2. Use environments

Puppet has this concept of Environments which proves to be very useful for applying your configuration changes on less critical servers first, then promoting those changes to production when tested and ready.

We use 2 Puppet environments: staging and production. At initial provisioning of a server, we assign the staging environment to all pre-production boxes (DEV and QA in our case). We assign the production environment only to, you’ll guess, production servers. Each environment is tied to a specific branch in our Git repository (“master” branch in Git is production and “staging” branch is staging)

We do most changes on the “staging” branch, apply them on pre-production boxes, then when we know it’s stable, we promote the changes by merging them into the “master” branch and apply them to production servers.

It’s not always possible to follow this flow (not all servers have pre-production replicas), but when it is, we do. It’s good for the peace of mind.

3. Use dry-runs

Even with the best precautions taken, things can get messy when you actually get to run the Puppet agent to apply your configuration updates on your servers. To reduce the risk of problems, I highly suggest running the Puppet agent in “dry run” mode using the following options:

puppet agent [...] --verbose --noop --test

Using those options will cause the Puppet agent to only show what it would do, not what it did. You get to see the diffs for all files that would be modified and validate things are going to go as you expect.

4. Use librarian-puppet

Managing module dependencies can be a source of headaches, especially when many people are working on Puppet code and they each need to test it on their own computer. Librarian-puppet provides some sanity to the process by automatically managing your module dependencies. You express your dependencies in a file (the “Puppetfile”) and the tool will install, update or remove modules automatically when you run it, always matching what’s specified in the Puppetfile. It’ll even resolve and install the modules’ own dependencies (what we would call transitive dependencies) and detect compatibility issues.

Using librarian-puppet on the Puppetmaster also allows for easier deployments: no need to install and manage your modules manually. With librarian-puppet, a deployment usually goes with two simple steps:

  1. Sync your main sources with your code repository (ex: git pull)
  2. Run librarian-puppet to synchronize your installed Puppet modules

Tip: Don’t use Git dependencies with no version specifier

Librarian-puppet allows you to declare dependencies on modules that come directly from a Git repository this way:

  mod "stdlib",
    :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git"

Be careful using this with open-source modules that you don’t control as this tells librarian-puppet you want to use the latest, bleeding edge version of the module. If the module’s author decides to change something in a incompatible manner, you’ll probably get to spend some quality time with Puppet’s sometimes cryptic error messages.

Instead, always use references in your Puppetfile:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "v1.0.2"

This will at least shield your Puppet code from inadvertently break because of backward-incompatible changes from the author. If the module’s author doesn’t use tags for releases, at the very least bind yourself on a particular revision:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "84f757c8a0557ab62cec44d03af10add1206023a"

5. Keep sensitive data safe

Some data needs to be kept secure. Examples of sensitive data you may need to put in your Puppet code are passwords, private keys, SSL certificates and so on. Don’t put this in version control unless you’re absolutely aware of the risks you’re taking doing so.

Puppet has a nice tool for separating all of your data from your actual manifests (code). That tool goes by the name of Hiera and allows you to store data about your servers and infrastructure in YAML or JSON files. From usage, you’ll see that most data in Hiera files is not confidential in nature… so should we refrain from using version control for Hiera files just because of a few elements that are unsafe? Certainly not!

The trick is to use Hiera’s ability to combine multiple data sources (backends). What you can do is split hieradata files into 2 types: YAML files for your “main” hieradata files and JSON files to store your “secured” data. Those JSON files are not to be put under version control and are stored securely on a single location: the Puppetmaster. This way, very few people can actually see the contents of the sensitive files.

Here’s how to configure Hiera as such (hiera.conf):

  - %{hostname}
  - %{environment}
  - common
  - credentials
  - yaml
  - json
  :datadir: '/etc/puppet/hieradata'
# only credentials are stored in json hiera datastore
  :datadir: '/etc/puppet/secure/hieradata'

6. Create abstractions for your high level classes

I guess this will vary depending on preferences and most probably not everyone is going to agree, but I’ve found that wrapping uses of modules into wrapper classes provides better maintainability of the Puppet code over time. This is better explained by an example…

Suppose you want to setup a reverse proxy server using an existing Nginx module. Instead of directly assigning the ‘nginx’ class on your nodes and setting all of the required stuff up, create instead a new class called, say, ‘proxy_server’ with the attributes you want to consider for your proxy server as class parameters. Assigning the ‘proxy_server’ class on your node not only better states your intent, but it also creates a nice little abstraction over what you consider as a “proxy server”. Later on, if you decide to go away from Nginx (highly impropable, why would you sin as such? 🙂 ) or use another Nginx module (more probable!), then you’ll probably just need to change the content of your “proxy_server” class, instead of a bunch of tangled node definitions.

That’s it!

I hope you’ll find the above list useful! Please do not hesitate to share your own experience and best practices in comments.


4 thoughts on “A few Puppet best practices

  1. Thanks for the good information. I’m just starting out in puppet, and best practices are relatively scarce.
    I do have one question: how do you deal with environments, hiera, and version control? I can see each one being a good thing, but I’m having a hard time envisioning how they all work together, especially in terms of directory structure for the different environments and how (or if) you would specify the environment each host (or group of hosts) gets in hiera.

    • @geek65535: Because the environment a host is assigned defines the actual configuration it’ll get from the Puppetmaster, you should assign the environment to the host somewhere else than in hiera.

      As you might know, the environment for a host is usually configured using the environment option in puppet.conf. The puppet agent uses that info to fetch the correct configuration from the Puppetmaster on each run. Our strategy has been to manage the puppet.conf file on target hosts using Puppet, and environment is one of the attributes that are managed. The trick is to make sure you specify the environment on the first launch of the puppet agent for all hosts (using the --environment command line option). On that initial puppet agent run, the puppet.conf file gets rewritten by Puppet and includes the correct environment for all subsequent runs.

      Then on the Puppetmaster, you can configure puppet with a dynamic path to the manifests (ex: manifestdir=/etc/puppet/$environment)

      Finally, have multiple copies of your manifests and hiera files checked out from different branches in your VCS on the Puppetmaster. Ex: /etc/puppet/staging/… and /etc/puppet/production/… This allows you to make changes in the staging branch, test them on staging servers, then eventually merge those changes to the master/production branch and apply them to your production servers.

      I hope it’s clear enough, this is not very easy to explain clearly 🙂 Let me know if you have any question.

  2. Thanks for the information. I just ran across another site (http://www.allgoodbits.org/articles/view/29) that explains the same trick you’re talking about.
    After reading that article and your explanation, things make a lot more sense. I’m thinking there might even be a way to go one step further and do something like change the default environment on the puppetmaster (or use ‘production’ as a special case) so that the first time a node is brought into puppet, it automatically hits an environment that then uses hiera to put it in its permanent environment.

    • Yes, I think the “bootstrap” environment strategy is interesting for controlling the environment from hiera data instead of specifying it at initial provisioning!

      With this strategy you would not necessarily need to set the default environment on the Puppetmaster, but you could rather set it using hiera tricks (ex: defining the default in a common.yaml file, and overriding it for production boxes specifically on each box)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s