Immutable infrastructure is an approach to DevOps that leverages immutability where. Immutability is a concept that has become common in the software development world. Originally a major component in functional programming, many languages and software frameworks have adopted immutability as a means of controlling complexity and reducing bugs, especially in concurrent systems. Many of the same techniques can be applied to the DevOps world.
Before we can properly dive into how immutable infrastructure works, let's talk about standard operations workflows based on mutable infrastructure.
Let's describe a common, non-cloud deployment scenario for an application. Typically, a sysadmin will either requisition a physical server, or rent a Virtual Private Server to host the software. The sysadmin will install the base OS, log in remotely, configure the relevant services, copy over the application to be run, and run it. If the sysadmin is well disciplined, he/she will additionally take detailed notes on this process.
In the future, when a new version of the application becomes available, the sysadmin will once again log in, download the updated application, and run the new version. Again, if disciplined, he/she will take notes on how this upgrade process goes. A similar process applies to updating packages on the base OS, such as for security updates.
Later, when the load on the server increases, the sysadmin will set up a second machine, following the same procedure followed before. Ideally, those detailed setup notes will now come in handy, and an almost-identical machine will be ready to pick up some of the incoming requests.
Unfortunately, this approach leaves a lot to be desired.
It relies heavily on the sysadmin to run and document the steps
correctly. This is unfortunately an error prone, tedious, and time consuming process. In the real world, this is often the source of significant issues and delays in deployment.
The process for initial creation of a machine is distinct from the
upgrade process. This can lead to mismatches between newly setup machines and upgraded machines.
Any kind of machine failure can result in an emergency recovery
session, where the operations team needs to quickly spin up and configure a new machine. If there are any problems with the installation instructions, this can result in significant downtime.
Due to the effort involved in setting up a new machine, extra
machines are rarely---if ever---decommissioned. If you have a sudden spike in traffic that dissipates, odds are that you'll continue to run that machine indefinitely, incurring unnecessary hosting costs.
Some of these problems can be mitigated with tooling like configuration management systems. These systems try to provide a declarative interface for specifying the state you want the machine to be in, and perform appropriate installation or upgrade as necessary to get to that point. While this mitigates some of the problems above, it doesn't address all of them. And these systems have a tendency to introduce their own instabilities in your system, leading to costly downtime.
Moving to cloud computing changes the way we look at servers. In the scenario above, a server is an investment. We spin it up, we set it up, we care for it, we repair it, and so on. Not so in the cloud world. In the cloud, a server should be a cheap, replaceable commodity. The cloud focuses on programmatic automation of resources. Starting or stopping a machine isn't a notable event; it's a normal part of day to day administration.
In the cloud world, it's common to use automation tools that create machines for us. Those tools may adjust the sizes of our clusters. Cloud services like Auto Scaling Groups will automatically increase a cluster size based on load, and in turn decrease the cluster size when that load disappears. And health checks will automatically shut down malfunctioning machines, to be replaced with fresh, healthy machines.
In order to make this all work, setting up a new machine must be fully automated. We can't require sysadmin involvement each time a new scaling or health check event occurs. Much of this can be performed with provisioning scripts, which can use the same configuration management systems mentioned above. However, this has the downside of potentially failing intermittently, or being dependent on resources that may change over time.
Instead, in our immutable infrastructure world, we like to rely upon machine images. Provisioning scripts take a vanilla, bare-bones OS installation and customize it at deployment time. Immutable infrastructure approaches move this setup to build time. We take some base OS image, run the setup scripts, and then capture the result into a complete machine image. This helps in many ways:
It reduces the number of steps necessary to run during deployment,
meaning you can respond to new load or replace an unhealthy server even faster.
You're guaranteed an identical machine setup is deployed each time,
avoiding the potential for changes in package versions available at different deployment times.
assist greatly in meeting regulatory requirements in industries like finance, health care, and insurance.
There is one less moving part that may fail at deployment time,
leading to a more robust setup.
Typically, the creation of these machine images will occur in a Continuous Integration environment. The full battery of integration tests for the application can be run against this image. Once the image is vetted by these tests, and potentially by a manual Quality Assurance signoff, it can be uploaded into cloud storage, and the cluster can be moved over to the new machine image.
This process can be a bit more work to get set up with initially versus a provisioning step. And compared to the pre-cloud scenario described above, it's a significant mental shift. That said, once set up, an immutable infrastructure approach reaps large rewards in maintainability, responsiveness of the cluster, and human effort.
With the popularity of containerization and orchestration tools like Kubernetes, a more lightweight version of the machine image approach is possible. Instead of creating a brand new machine image, and needing to request new cloud machines to deploy updates, Docker images provide for an immutable image format that can be run on existing machines. Docker images can typically be created faster than machine images. It's possible to run multiple Docker images on a machine, allowing for an easier path to zero-downtime blue/green deployments. And with Kubernetes, you can efficiently pack multiple services onto a single node to reduce server costs. Tools like Minikube make it possible to test complex deployment scenarios on a local machine, speeding up your Quality team.
The individual machines running the Kubernetes nodes can still rely upon immutable machine images to provide the host OS that will run the Docker images. However, these images can change less frequently, allowing you to keep your cloud machines over a longer period of time and reduce the churn of creating and testing those images.
It may seem that, in this world of immutable infrastructure, there is no room for configuration management tools. Their primary function is to handle the mutation of existing machines towards a specific state. However, when creating a machine or Docker image, we still need some way of configuring the base OS and installing additional software. It's possible to do this with simple shell scripts. However, configuration management tools can provide some advantages:
These tools have been designed to simplify the tasks involved in
setting up a system, and may therefore be easier to use and more maintainable than shell scripts.
In many organizations, a large collection of these configuration
management scripts already exist, and rewriting would involve significant work.
Fortunately, configuration management can work hand-in-hand with immutable infrastructure. You can use your existing scripts when creating your images. You won't be using the full power of configuration management, since you'll always be starting from a pristine image state. But you can look at this as a positive: you're less likely to end up in an indeterminate and buggy state.
Immutable infrastructure underlies much of what we do at FP Complete in our DevOps practice. If you want to learn more, check out our DevOps syllabus, training offerings, and our consulting services. To summarize the recommendations: