A few Puppet best practices

Puppet, the popular configuration management tool, can get tricky at times. After a few months of using Puppet to manage our servers at work, a few practices have emerged as good, recommendable practices. I wanted to share a few of them with the rest of the world here so that beginners get a head start but also to get a good base for discussion with more seasoned Puppet users.

1. Version control as much as possible

This one may seem obvious to anyone who has used version control, but it isn’t obvious for everybody. Many sysadmins who start to use Puppet have had limited exposure to version control, which they often consider as a tool reserved for developers alike.

Using version control will open up a lot of additional possibilities with Puppet, such as better tracking of changes, testing your Puppet manifests in an isolated environment, promoting your configuration from environment to environment, etc. Version control even provides a free backup for your configuration code.

You will see gains from using any version control system (VCS), but modern distributed VCS systems (such as Git, Mercurial or Bazaar) prove to be particularly useful here due to the ease they provide in managing multiple branches of code.

Using a code collaboration tool such as Github, Bitbucket or Gitlab (self-hosted and open-source, highly recommended) will also allow you and your team to review each other’s changes before they are applied. I won’t try to convince anyone of the virtues of code reviews here, but let’s just say you’ll end up with much better, more maintainable Puppet code if you consistently review your changes with your peers.

Put all of your Puppet files (manifests, templates, files, hieradata files) under version control, then checkout a working copy on your Puppetmaster. When ready to “deploy” changes to your Puppermaster, just sync the working copy on the server with the code in the version control repository.

2. Use environments

Puppet has this concept of Environments which proves to be very useful for applying your configuration changes on less critical servers first, then promoting those changes to production when tested and ready.

We use 2 Puppet environments: staging and production. At initial provisioning of a server, we assign the staging environment to all pre-production boxes (DEV and QA in our case). We assign the production environment only to, you’ll guess, production servers. Each environment is tied to a specific branch in our Git repository (“master” branch in Git is production and “staging” branch is staging)

We do most changes on the “staging” branch, apply them on pre-production boxes, then when we know it’s stable, we promote the changes by merging them into the “master” branch and apply them to production servers.

It’s not always possible to follow this flow (not all servers have pre-production replicas), but when it is, we do. It’s good for the peace of mind.

3. Use dry-runs

Even with the best precautions taken, things can get messy when you actually get to run the Puppet agent to apply your configuration updates on your servers. To reduce the risk of problems, I highly suggest running the Puppet agent in “dry run” mode using the following options:

puppet agent [...] --verbose --noop --test

Using those options will cause the Puppet agent to only show what it would do, not what it did. You get to see the diffs for all files that would be modified and validate things are going to go as you expect.

4. Use librarian-puppet

Managing module dependencies can be a source of headaches, especially when many people are working on Puppet code and they each need to test it on their own computer. Librarian-puppet provides some sanity to the process by automatically managing your module dependencies. You express your dependencies in a file (the “Puppetfile”) and the tool will install, update or remove modules automatically when you run it, always matching what’s specified in the Puppetfile. It’ll even resolve and install the modules’ own dependencies (what we would call transitive dependencies) and detect compatibility issues.

Using librarian-puppet on the Puppetmaster also allows for easier deployments: no need to install and manage your modules manually. With librarian-puppet, a deployment usually goes with two simple steps:

  1. Sync your main sources with your code repository (ex: git pull)
  2. Run librarian-puppet to synchronize your installed Puppet modules

Tip: Don’t use Git dependencies with no version specifier

Librarian-puppet allows you to declare dependencies on modules that come directly from a Git repository this way:

  mod "stdlib",
    :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git"

Be careful using this with open-source modules that you don’t control as this tells librarian-puppet you want to use the latest, bleeding edge version of the module. If the module’s author decides to change something in a incompatible manner, you’ll probably get to spend some quality time with Puppet’s sometimes cryptic error messages.

Instead, always use references in your Puppetfile:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "v1.0.2"

This will at least shield your Puppet code from inadvertently break because of backward-incompatible changes from the author. If the module’s author doesn’t use tags for releases, at the very least bind yourself on a particular revision:

mod "stdlib",
  :git => "git://github.com/puppetlabs/puppetlabs-stdlib.git",
  :ref => "84f757c8a0557ab62cec44d03af10add1206023a"

5. Keep sensitive data safe

Some data needs to be kept secure. Examples of sensitive data you may need to put in your Puppet code are passwords, private keys, SSL certificates and so on. Don’t put this in version control unless you’re absolutely aware of the risks you’re taking doing so.

Puppet has a nice tool for separating all of your data from your actual manifests (code). That tool goes by the name of Hiera and allows you to store data about your servers and infrastructure in YAML or JSON files. From usage, you’ll see that most data in Hiera files is not confidential in nature… so should we refrain from using version control for Hiera files just because of a few elements that are unsafe? Certainly not!

The trick is to use Hiera’s ability to combine multiple data sources (backends). What you can do is split hieradata files into 2 types: YAML files for your “main” hieradata files and JSON files to store your “secured” data. Those JSON files are not to be put under version control and are stored securely on a single location: the Puppetmaster. This way, very few people can actually see the contents of the sensitive files.

Here’s how to configure Hiera as such (hiera.conf):

---
:hierarchy:
  - %{hostname}
  - %{environment}
  - common
  - credentials
:backends:
  - yaml
  - json
:yaml:
  :datadir: '/etc/puppet/hieradata'
# only credentials are stored in json hiera datastore
:json:
  :datadir: '/etc/puppet/secure/hieradata'

6. Create abstractions for your high level classes

I guess this will vary depending on preferences and most probably not everyone is going to agree, but I’ve found that wrapping uses of modules into wrapper classes provides better maintainability of the Puppet code over time. This is better explained by an example…

Suppose you want to setup a reverse proxy server using an existing Nginx module. Instead of directly assigning the ‘nginx’ class on your nodes and setting all of the required stuff up, create instead a new class called, say, ‘proxy_server’ with the attributes you want to consider for your proxy server as class parameters. Assigning the ‘proxy_server’ class on your node not only better states your intent, but it also creates a nice little abstraction over what you consider as a “proxy server”. Later on, if you decide to go away from Nginx (highly impropable, why would you sin as such? 🙂 ) or use another Nginx module (more probable!), then you’ll probably just need to change the content of your “proxy_server” class, instead of a bunch of tangled node definitions.

That’s it!

I hope you’ll find the above list useful! Please do not hesitate to share your own experience and best practices in comments.

Advertisements

Testing your Puppet manifests using Vagrant

We’ve recently started using Puppet to provision our servers really seriously at work. All of our new servers are systematically provisioned using Puppet, and we now have a rule that nothing on servers can be changed by hand, every single bit needs to be managed by Puppet.

Managing servers using Puppet is extremely useful and powerful, but you also need to know what you’re doing because a small untested change in your Puppet manifests could introduce instability (or worse!) in your production environment.

Quality is one of my top concern. In infrastructure management, quality is often measured from stability. Thus, I quickly felt the need to thoroughly test any change to our Puppet manifests in a completely isolated environment.

My search for the holy grail led me to the nirvana of Puppet testing: a combination of Vagrant and librarian-puppet.

Vagrant

With Vagrant, you describe a virtual machine in a simple configuration file (a “Vagrantfile”), and then using the Vagrant command line tools, you can create, destroy or re-provision one or more virtual machines in a quick and repeatable manner. In the background, Vagrant uses the free Oracle Virtualbox for running the virtual machines (more recently, new “providers” have been added such as VMWare or even an experimental Amazon EC2 provider). You can also easily connect to your virtual server using SSH to validate what Puppet has done to your server. This is ideal for testing Puppet as you can scrap your VM whenever you need and restart from scratch in less time than it takes to make coffee.

Librarian Puppet

Librarian-puppet manages your Puppet module dependencies. In a file called Puppetfile, you describe which modules your infrastructure depends on, then librarian-puppet does the rest. It installs dependencies, upgrades them or removes them when needed. This is an invaluable tool to control when and how your module dependencies are updated.

A quick kickstart project to get you running

If you want to try the workflow of managing a Puppet-provisioned virtual server in Vagrant, I’ve created a sample project that will help you jumpstart your setup. This project is available on my Github account.

See the README file for details on what to install and how to run it. A few Puppet best practices are also described in that README file.

Let me know if you have any question! Enjoy!

Tip: Adding a local HTTP proxy when using Yeoman 1.0 with Grunt and Livereload

[UPDATE]

Grunt and the plugins ecosystem is moving quite fast, and thus this post is a bit outdated now. Although the concepts still apply, Livereload is now embedded in the “watch” plugin and the way to configure the “connect-proxy” plugin to achieve a local proxy to your Web app has changed quite a bit. Documentation has also improved so kudos to the Grunt community!

 

Yeoman is a Javascript code generation and application bootstrapping framework that integrates many extremely cool, bleeding edge technologies for streamlining frontend rich Javascript application development.

We just started an Angular.js application at work and we used Yeoman to kickstart the application’s structure. Not only did the initial project generation work (almost) flawlessly, but we discovered some truly amazing new tools like Livereload (which really *is* the web developer wonderland!) and Compass.

Developing a rich client-side only Javascript with Yeoman’s tooling requires the developer to run a simple local HTTP server to be able to see the application running. This HTTP server (Livereload) is out of this world: it will reload the app (including all code, SASS/CSS generation, images, etc.) on-the-fly upon any change in the source code. And by reload I don’t mean the usual you refresh the page in your browser, I mean Livereload will refresh everything automatically as soon as you hit Save on any source file of your project.

When we want to run our Angular.js application with Yeoman, all we do is run this command:

grunt server

And we’re off! The browser automatically launches on URL http://localhost:9000/ and displays our live-reloading application.

This is all extremely good stuff, but we quickly hit a road block. Since our application is being served by a local server that serves only static files, how do we interact with our backend server (a Django application in our case) given most browsers prevent cross-site scripting (XSS)?

More specifically, the local server running my Angular.js application is accessible on this address:

http://localhost:9000/

Whereas my Django app is running on this address:

http://localhost:8000/

If I try to do any remote Ajax call to my backend server application (the one that runs on port 8000), it will fail because my browser will refuse to run an XHR to a different origin. The solution most JS frameworks come with is to add a simple HTTP proxy feature within their local server to deal with the same-origin restrictions imposed by browsers.

Unfortunately, even if this use case is almost always encountered when developing a rich JS app locally, Yeoman’s documentation on that use case is very scarce. So save yourself the search, here’s the trick!

Use the grunt-connect-proxy module and add a proxy to your gruntfile!

Just diligently follow the procedure on the project’s GitHub page and it’ll work. There is no mention anywhere that this works with Yeoman’s Grunt setup, but it does!

After you install the module and apply the few changes to your project, your Grunt server will proxy requests to your backend application running locally. When running the “grunt server” command, you should see this:

...
Running "configureProxies" task
Proxy created for: /api
...

Which confirms your proxy is properly setup.
Just make sure your backend application’s URLs are all prefixed by a specific path (ex: /api is often used) and you’ll be good to go!

Reliable Delivery Pub/Sub Message Queues with Redis

Redis-title

UPDATE: I have open-sourced a Java implementation of the below principles called “RedisQ“. Enjoy!

Redis is a high performance key-value datastore that differs from other key-value solutions in the way it handles values. Instead of just storing values as simple strings, it recognizes multiple specific data types such as Lists, Sets, Hashes (maps), Strings or Numbers. Each data type has its own set of features to manipulate the data it contains in an atomic manner, making it an ideal tool for highly distributed system where concurrency is a potential issue.

Combining those features in creative ways allows for novel ways of doing “traditional” things differently. One particular combination has recently allowed my team and I to implement a moderately (read: good enough) reliable message delivery mechanism for multiple consumers consuming messages at their own pace.

The solution has advantages and some caveats, but if your problem allows you to live with the possible drawbacks, it’s a nice lightweight solution to a problem that is usually answered using some more traditional (and more complex) tools, like *MQs.

In a recent project, we ended up choosing Redis mostly because:

  • It was already part of our architecture and we had a simple inter-component messaging use case but didn’t want to introduce a new component in our architecture just for this.
  • Expected volume was low, which meant that our data set could fit in memory. Note: although Redis requires everything you store in it to fit in memory, it supports persistence to disk.
  • Redis allowed for all of the implementation characteristics we were looking for, namely:
    • Concurrency: Because all operations in Redis are atomic, supporting concurrency without too much of a hassle is straightforward.
    • Persistence: Configured properly, we can ensure persistence of our queues to disk using one of the supported Redis persistence strategies.
    • Lightweight: Using Redis from any language/platform is extremely simple and provisioning it / maintaining it on a production server is dead easy.

In this post, I will go over the strategy we used with regards to Redis data structures and operations for handling the message publishing and consuming.

The high-level strategy consists of the following:

  • When each consumer starts up and gets ready to consume messages, it registers by adding itself to a Set representing all consumers registered on a queue.
  • When a producers publishes a message on a queue, it:
    • Saves the content of the message in a Redis key
    • Iterates over the set of consumers registered on the queue, and pushes the message ID in a List for each of the registered consumers
  • Each consumer continuously looks out for a new entry in its consumer-specific list and when one comes in, removes the entry, handles the message and passes on to the next message.

Why not use Redis Pub/Sub?

I already see you coming and asking why not using the Pub/Sub semantics supported out-of-the-box by Redis? The reason was two fold:

  1. What Redis offers with Pub/Sub is a listener model, where each subscriber receives each messages when it is listening, but won’t receive them when not connected.
  2. In a clustered environment where you have multiple instances of your consumer component running at the same time, each instance would receive each message produced on the channel. We wanted to make sure any given message got consumed once per logical consumer, even when multiple instances of this component are running.

Hence the name of this post “Reliable Delivery”, because we wanted to make sure every logical consumer eventually receives all messages produced on a queue once and only once, even when not connected – due to, for example, a deployment, a restart or a application failure/crash.

Detailed look at the strategy

Here’s a closer look at the different scenarios using a fictive example of an ordering system with multiple consumers interested in messages when new orders are created:

Registering a consumer

Slide1

A “consumer” represents a logical entity of your architecture. You assign each concumer an identifier which it will use to register itself as a consumer on the queue.

Registering a consumer is only a matter of adding a Set entry to a key that is crafted with the name of the queue in it.

The semantics of a Set are helpful here: each consumer can just “add” an entry to the Set upon start up in a single operation, without the need to worry about any existing value.

Publishing a message

Slide2

On the Producer side, a few things need to happen when we’re publishing a message to a specific queue:

  1. The Producer increments a counter to get the next message ID using the INC command on key “orders.nextid”
  2. It then stores the message in a key containing the new message ID (“orders.messages.8” in our case). The actual format you store messages can be anything. We used a hash with some metadata information about each message, along with the actual payload. The payload can be serialized in JSON, XML or any format makes sense for your usage.
  3. Then for each consumer registered in key “orders.consumers”, it pushes the message ID using the RPUSH command on lists for each consumers.

To prevent duplication of message content in Redis, we store the content once and then only add references to the messages in consumer-specific lists. When a consumer consumes messages (more on that later), it will remove the ID from its list (its queue), then read the actual message content in a separate operation.

But what happens when all consumers have read the message? If we stopped here, each message would end up being stored in Redis forever. An efficient solution to this problem is to use Redis’ ability to expire (clean up) keys after some time using the EXPIRE command. Using a reasonable amount of time for the expiration makes up for a cheap cleanup process.

A slight variation, at a cost of message content duplication, would be to store the actual message content in each consumer-specific list. For simpler use cases where messages are small enough, this could be a compelling tradeoff.

Consuming messages

Slide3

Each consumer has a specific identifier and uses this identifier to “listen” on Lists stored in specially crafted Redis keys. Redis has this nice feature of “blocking pop”, which allows a client to remove the first or last element of a list, or wait until an element gets added.

Leveraging this feature, each consumer creates a thread that will continuously loop and do the following:

  1. Use BLPOP (blocking left pop) with a moderately small timeout to continuously “remove an element from the list or wait a bit”.
  2. When an element gets removed by the pop operation, read the message content and process it.
  3. When an element does not get removed (no message available), just wait and start over again.

You can have multiple threads or processes consuming messages with the same “consumer identifier” and the solution still works. This allows for both stability and scalability:

  • You can spawn multiple consumers consuming messages as the same logical entity, and ensure that if one goes down, the consumption of messages does not stop.
  • You can also spawn more consumers when needed for added horsepower.

Caveats

  • The solution as described above does not support retryability of messages in case of a failure to process on the consumer side. I could imagine a way to do it using Redis, but one has to wonder if Redis is still the right tool if such a characteristic is required by your use case.
  • The solution also does not guarantee that messages will be consumed in the order they were produced. If you have a single consumer instance you’re covered, but as soon as you have multiple consumer instances you cannot guarantee the ordering of messages. Maintaining a lock in a specific key for each consumer would enable this, at the cost of scalability (only 1 message can be consumed at any time throughout your consumer instances).
  • If your producers and consumers are using different languages, you must implement this strategy for each platform/language. Fortunately, there are Redis client for pretty much any popular platforms.

Wrap up

There are many ways Redis can be leveraged using the simple data structures and atomic operations it provides. Using a particular combination of those, we’ve been able to implement a simple system that allowed reliable message delivery to multiple consumers in a clustered environment without too much of a hassle.

Because Redis was already part of our architecture, it proved to be a natural choice. The efforts required to build up this solution outweighed the efforts required to provision and maintain an additional component to manage our queues.

It might not be the most appropriate choice for your case. Be thoughtful in your architectural choices!

UPDATE: I have open-sourced a Java implementation of the above principles called “RedisQ“. Enjoy!

Small team, multiple projects: an Agile approach to planning

Context: you have a small team of 5 people, evolving in a highly dynamic environment with small projects (2-4 weeks, sometimes less, sometimes more) coming all over the place for the team to realize. This is a very usual pattern observed in agencies, or smaller teams dedicated to professional services (services to clients).

How do you approach people planning (a.k.a “resource planning”, although I won’t hide my aversion for the term “resource” when referring to a person) in that kind of context? The inate approach to this problem seems to start assigning individual people from the team to different projects in the pipeline, trying to optimize utilization of people on an individual basis:

Slide1

On the diagram above, we have a fictive situation of 4 projects in the pipeline. The first 3 projects (A, B and C) are either already started or start at the beginning of week 1. The 4th project (D) starts on the 4th week.

Some observations:

  • Because we are assigning people as mere resources, suboptimal situations occur, such as people having nothing to do at specific times, for short period of times
  • Some people are clearly leads on some projects, while others are lugged between projects
  • Some people never participate in some projects, limiting knowledge to the few who worked on these projects
  • This is a quite simple situation, and it already looks a nightmare to manage. You quickly see the need for a new “traffic controller” job to manage and optimize people allocation. In a larger organization with more people, this quickly becomes a full-time job.
  • For project D, which starts on the fourth week, only 1 person is assigned. You’ll need to be very careful for the rest of the project to plan for knowledge transfer: this person could become sick any day, leaving you in the dust.
  • From my experience, this kind of planning encourages individuality, leaving you not with a team, but with a bunch of individuals working separately. These individuals tend to not be aware or even sensible to what others are doing and their problems. They will also have a tendency to stop being proactive and wait for job to be assigned to them.

Far from ideal and not very agile! Unfortunately, it is a common situation.

I recently came up with a more systemic approach to the problem. Instead of assigning people to projects on an individual basis, why not see all projects as being the work to be done and the team as being the system which realizes the work to be done?

Start seeing the group representing the team as a whole, capable of a certain capacity of work. Then, split this capacity in value streams. When planning for projects, assign projects to these value streams instead of the individuals.

Slide2

Before any project starts, make sure you have an estimated and prioritized backlog of features. At the beginning of each sprint, pull some work from these backlogs according to the planned projects in each value stream:

Slide3

You may need a project to be put on the fast lane for any reason (the project is late, or there’s no other project in the pipeline). Then, simply assign more value streams to this project. This will allow more work for this project to be selected and included in the sprint:

Slide4

When a sprint starts, leave the team alone. If you are a manager, trust the team to organize around the work to be done. Coach them to become more efficient in the way they work together. Always treat the team as a whole. Do not try to identify owners for projects yourself: they’ll probably do it naturally. Encourage them to work in pairs for more complex problems. Make sure they take time to inspect how they work, then that they adapt.

When you start treating people as teams, some things start to happen naturally:

  • People start to care and worry for their team partners
  • Overall productivity increases
  • One person leaving (holidays, sickness) does not bring a project down anymore
  • People start actually enjoying their job a lot more

An important factor to the success of the approach is to only allow a number of value streams smaller than the number of individuals in the team. I would say a maximum of:

floor(team size / 2) + 1

Examples:

  • Team of 3: 2 value streams
  • Team of 6: 4 value streams
  • Team of 7: 4 value streams

This has multiple effects:

  • Since there is less projects than people, this fosters collaboration, thus enabling natural knowledge transfer within the team.
  • It effectively creates a real team: a group of individuals working on the same goals together as a whole.
  • Suboptimal situations, like people having nothing to do, are removed.

This approach also has a potential for scaling to much more than a single team. If the number of projects is too high for a single team, create multiple teams, assign projects to teams, then plan these projects based on their value stream.

I’ve started this way of working about a month ago with the team I am leading, and they simply love it! Planning is also a lot simpler, allowing me (and others) to concentrate on other matters.

One thing that is clear is that the team bonding and productivity benefits do not happen overnight. As the Virginia Satir Change Model states, one must accept reduced or steep variations in productivity on the first few weeks following the change, especially if the so-called “team” has been working individually for a while before the change.

If you try this approach (or have already tried it, or a variation of it), please leave some feedback! I’m very interested in knowing how this works out for you.

Easier builders in Java

Anyone that has used the builder pattern for building simple Pojo-style Java classes is probably aware that writing these builder classes quickly becomes quite unpleasant and definitely not fun. You quickly realize that your builders often mimic the structure of your Pojo’s setters, finding yourself almost duplicating half of Pojo’s code for the sake of the pattern.

Following a recent post from Eric Mignot and a few prior reflections I had on optimizing the process of writing these builders, I have come up with a solution that will, I hope, greatly simplify trivial cases (that is, building simple pojos) and, eventually, as the tool evolves, allow for slightly more complex cases to be covered.

So, let me introduce you to the Fluent Interface Proxy Builder. The tool only requires the developer to write the builder interface, not the implementation. The actual implementation is assured by a dynamic proxy that will intercept method calls on your interface, then set the corresponding properties on your Pojo object.

Quick example. Suppose you have a simple Java Pojo:

public class Person {
    private String name;
    private int age;
    private Person partner;
    private List<Person> friends;

    public void setName(String name) {
        this.name = name;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public void setPartner(Person partner) {
        this.partner = partner;
    }

    public void setFriends(List<Person> friends) {
        this.friends = friends;
    }

    ... getters omitted for brevity ...
}

To get a builder for this bean, write a builder interface following a few naming conventions:

public interface PersonBuilder extends Builder<Person> {
    PersonBuilder withName(String name);
    PersonBuilder withAge(int age);
    PersonBuilder withPartner(PersonBuilder partner);
    PersonBuilder havingFriends(PersonBuilder... friends);
    Person build();
}

Note: The super interface “Builder” used here is provided by the framework. This interface has a “T build()” method. I included the “build” method in the example above for the sake of clarity. You may also use your own super interface if using the one provided by the framework proves to be a problem.

To use your builder, first create an instance:

PersonBuilder builder = ReflectionBuilder
                           .implementationFor(PersonBuilder.class)
                           .create();

Then you may use this dynamic builder normally through your interface:

Person person = aPerson()
                .withName("John Doe")
                .withAge(44)
                .withPartner( aPerson().withName("Diane Doe") )
                .havingFriends(
                    aPerson().withName("Smitty Smith"),
                    aPerson().withName("Joe Anderson"))
                .build();

Have a look at the Github project page for all the details and instructions on how to use it in your own project. You may use this freely by the terms of the MIT license.

Get it here!


It is also worth mentioning other alternatives that exist and deserve consideration:

The slight annoyance I see with the latter two (code-generating approaches) is that since the code is generated, it will overwrite any naming customization you’d make after the initial generation. It also makes maintenance of the builder harder over time, as the objects being built evolve. From my point of view, adding a method on an interface is quicker and more natural than re-generating the builders (and possibly overwriting custom names).

Delivering software more efficiently

Organizations today are always looking for ways to improve how they build software. To stay competitive on fast-paced markets, they have to optimize the delivery pipeline to bring features from ideation to market more rapidly. Many rightfully seek solutions by adopting agile or lean practices. To be fully effective, these methodologies also need to be supported by rigorous engineering practices such as those brought forward by Extreme Programming. Executed correctly, these are all very good ways of optimizing the way you and your team build your software.

Delivery = PRODUCTION

Unfortunately, building the software itself is just a part of the big picture. Your sparking new shiny software is not worth anything until it’s out in the wild. To get the most out of any software project investment, organizations need to make sure their software is in the user’s hands as soon and as frequently as possible and with minimal overhead. This is what I mean here by “more efficient delivery”.

Delivering good software is demanding. Delivering good software fast is quite a challenge. What I present below are techniques and practices that, when adopted, will have a direct impact on the time required for a feature (or your software altogether!) to go from idea to your users. These practices will especially be helpful to agile and lean teams, who strive to build software in small, “potentially shippable” increments. Used correctly and alongside recognized engineering practices, they can help transform potentially shippable to definitely shippable.

Be always ready for deployment

One of the first mind shift teams must accomplish is to make sure their code is always ready for deployment. This requires the rigorous use of unit testing, use of slightly different software design paradigms, as well as working differently with source control.

Unit tests

Make sure your automated test coverage is top notch. Don’t necessarily aim for 100% figures, but make sure you’re confident that what’s covered is covered intelligently and correctly. Unit tests have become almost ubiquitous, but it’s unfortunate to see that people still write software without a good, pertinent test suite. Without a confidently complete test suite, your deployments might become much more embarrassing (and might be accompanied by much more praying, voodoo incantations and cute small animal sacrifices).

Feature toggles instead of feature branches

Design new, in-development features so that they can be toggled instead of isolating them in different source control branches. This allows for the new code to be continuously integrated instead of falling farther and farther behind the main line, resulting in painful and long merges. This practice also facilitates heavy and merciless use of refactoring, which feature branching often discourages.

Staged commits

Stage code commits in a special branch where tests are systematically (and automatically!) run to proof each commit, then (also automatically) promoted to the trunk/master/head/main if all tests pass. This way your main line stays as stable as possible. Modern VCSs, like Git or Mercurial, allow for much easier setup of that kind.

Minimize the feedback loop

Problems found early cost less to fix. For that reason, one must strive at making sure potential problems can be identified as early as possible. Make your unit tests run automatically upon each source control commit. Automate regular runs of your functional / performance tests suites. When tests fail, make sure the team is clearly (and again, automatically) notified so that they can switch their attention at fixing the error: they are not ready for deployment!

Automate everything

Deployment to any environment should be done by the push of a single button. Point. Final.Script everything. Allow nothing to be executed by a human. Where there are humans, there are errors. By having everything automated, you not only minimize possibilities of errors during deployment, you also make them quicker.

Use a tool for managing your database migrations. Almost all modern development platforms offer these tools. Research for the right one for your need. Database migrations can be generated by the tool and can be integrated in your deployment scripts so that they are applied automatically to the target environment. Also plan for the worse: your tool should allow for rollbacks (reverse migration) scripts as well.

To fully automate deployments, infrastructure configuration also needs to be taken care of. Use a configuration management tool for this (such as Puppet, Chef or CFEngine). Using a tool such as these, your servers can be provisioned and maintained automatically by the use of configuration “recipes”. Since these recipes are stored as text, they can be versioned and be an integral part of your code base, and evolve alongside your software.

Use a deployment pipeline

Stage your builds to at least a test environment where you can proof deployments. When a deployment is a success, it can be promoted to the next step in the deployment pipeline. Make unit and functional tests phases as integral parts of your deployment pipeline so that the entire deployment pipeline gets stuck if tests fail.

A deployment pipeline

Make sure your application is packaged only once for a given version and that this same package is deployed unchanged between the different environments. Store these packages in a central repository from which the deployment scripts can pull them upon deployment. This requires a clear separation between environment-specific configuration and code. Use your configuration management system to handle environment-specific configurations.

Monitor

When deployments become less of a pain and starts to become a non-event, you will be rapidly starting to think about deploying your code more often. Having an automated “health check” and smoke test suites ready will quickly become mandatory in order to make sure everything happened as planned to each environment. If you are going to use a deployment pipeline, run these tests after each deployment to a given environment and do not allow the pipeline to continue if one of these fail.

Form “delivery” teams

Reaching such a high level of build and deployment automation requires an extremely close collaboration between infrastructure and development teams. Make infrastructure part of the development team, instead of handing off obscure requirements to them late in the project. If possible (this is highly desired!), dedicate an infrastructure team member to your project. Not only will they have insights and knowledge on both your software and the infrastructure constraints, but they will also be able to work with the rest of the organization to help remove potential impediments to improving the delivery pipeline.

Believe!

Although these practices require a substantial amount of effort and collaboration to happen, the benefits teams get from adopting them quickly far outweigh the costs. Moreover, every single tip mentioned above can be implemented using solid and readily accessible open-source tools on most prevalent platforms.

Some also require an organizational mindset shift that transcends the delivery team’s boundaries. Corporate security policies, limited or restricted accesses, lack of trust between teams, teams jealously keeping control of their resources, communication barriers are all possible hurdles to improving the delivery efficiency of your team. Address them one at a time and continue to believe!

It’s never really done

Do not necessarily try to get the whole thing at first. Make a list of the improvement items that need to be addressed by your team, prioritize according to value and go step-by-step. This is a never ending process: there is always something to improve in your delivery pipeline. Regularly reflect on what more can be done to make your deliveries easier and more frequent.

Start this process as early as possible in your project so that you get the most of the additional value provided by these practices. Starting early has the nice side-effect of making teams think of automation every time they need to make a decision about their general software architecture. How will this impact our delivery pipeline? Can we automate this and that? If not, what could allow us to do so?

And hey, why are you still reading this? Go Deliver Something!