FP Complete builds on cutting-edge open-source devops technologies, providing devops solutions and consulting to a number of companies in life sciences & health, financial services, and secure Internet services. This exposure has given us a chance to work with some of the best engineering practices in devops.
As we bring more companies forward into the world of devops, we will continue to share lessons learned and best practices with the IT community. Today’s best practice: immutability.
As a software engineering concept, immutability means that once you assign a value or configuration to some entity, you never modify it in place -- if you want it to change, you make a new one and (optionally) tear down the old one.
As we know from functional programming (FP) and our work with Haskell, immutability boosts the reliability and predictability of system behavior -- preventing bugs and downtime, and increasing the speed and reproducibility of software development and deployment. A variable, program, or server is immutable if we guarantee, once it’s set up, that we won’t modify it in place. We can do this if replacements and deployments are cheap.
Devops has made it so much cheaper to build and deploy new software services and even whole servers and clusters, some companies are now taking advantage of immutability -- leading to much more reliable online services with less downtime and more frequent, reliable, repeatable updates.
Mutability describes how most traditional software, and most traditional server operations, are done. It means that once you create something, to avoid the terrible cost of creating another one, you just keep modifying the one you already put in place. Software patches, configuration file changes, even changing the value of a variable that’s already in use -- all these are examples of mutability. It’s always been a bit risky, but can be economical when (1) the cost of creating an entity is very high, and (2) the cost of bugs and mistakes is very low.
Unfortunately, mutability is a key cause of bugs and mistakes.
Consider this with a person instead of a computer. If I break my arm (an accidental mutation of my state), we could try to fix the problem using an immutable method: make a new Aaron, identical to the old one, but with a non-broken arm -- then tear down the old Aaron who is no longer needed. Obviously we just don’t have the technology to do this -- it’s beyond unaffordable -- so we are forced to use mutability. We patch up my arm, and wait for it to heal.
That’s way better than giving up, but now our managed service (Aaron) is in an unprecedented, irreproducible state. For weeks my arm is offline, in repair mode, while the rest of me runs. And for the rest of my life I may have to keep track of the fact that this arm is a little weaker, and there are some things I cannot do. My boss now has to remember: Aaron has this special flag that says he can’t lift some kinds of heavy boxes. What a pain. At least humans are flexible, so my colleagues won't just break with an "Error 504" when they try to shake my hand.
If only I could reconstruct the arm in its original state -- or even a whole new Aaron -- life would be so much easier. We may never have that for humans, certainly not in my lifetime. But thanks to modern devops technologies, we do have that option for servers. They don’t need to be modified in place, and they don’t need to run in unprecedented, irreproducible configurations that lead to many of today’s sysadmin emergencies, security breaches, and downtime.
Our FP Deploy approach to devops is based on heavy use of containers (notably Docker), virtual machines, and where feasible, virtual networking, cloud computing (AWS, Azure, etc.), and virtual private clouds (VPCs). Every one of these technologies has something in common: it allows us to abstract away the work of creating a running online service. Configurations can be written declaratively, put under source control (say, in a Git repository), and run at any time (using, say, Kubernetes).
You want another server? Just run the deploy command again. You want another whole cluster (a “device” made of multiple servers and associated networking and data connections)? Just run that deploy command again.
By slashing the cost of deployment, we make it possible to create a whole new server painlessly, cheaply, and reproducibly. Developers just delivered a bug fix? Don’t patch the application server! Bring up a new instance based on the new software build. Once you’re happy that it’s running properly, bring down the old, less-good instance.
(In a future post we’ll talk about blue-green deployments and canary deployments -- cost-effective, easy techniques for making this transition cautiously and gracefully.)
An immutable server has a known, well-understood, source-controlled, reproducible configuration. No footnotes. We can be confident that the new production servers are the same as the engineering test servers, because they were created by running exactly the same deployment files -- not by a series of manual admin tweaks that could be incorrect or have latent side effects.
This also makes it easy to recover from disaster, or scale up for increased load, by redeploying a new server from the same deployment files.
We can afford to do this only because modern devops makes it so cheap to create new servers. When doing it right is automated, predictable, repeatable, and inexpensive, there’s no longer any reason to do it wrong.
It’s easy to assert that an application server can be made immutable, because well-architected web app servers are fairly self-contained and fairly stateless. But what if you are making larger changes to a whole distributed device, consisting of many servers and network connections? What if, for example, you have made matching changes in both your front-end server and your back-end server? Or in any group of services in a service-oriented architecture (SOA)?
Then you step up to immutable devices, immutable clusters or VPCs or distributed systems, using exactly the same methodology. At FP Complete we routinely create whole 10+ virtual machine devices on command, even just for testing purposes, because we’ve automated it with FP Deploy. Again, why not do it right? Why not be confident that the whole distributed system is in a known, reproducible state? We can be much more assured that what worked in test is going to work in production.
However, sometimes we don’t have that luxury, for example if we are retrofitting modern devops onto parts of an older system that has not been uniformly upgraded -- or in any case where the service we are upgrading is far cheaper to redeploy than another, perhaps more stateful, service in the same distributed system. That’s ok: we can have immutable servers as parts of a mutable device. The administrator of the distributed system now has to track which services have been replaced with newer versions, but at least for any given service the servers can be immutable.
Once we have made it cheap to rebuild and replace servers at any time, immutability can be a reality, and reliability and reproducibility go way up. However, not all servers are so easily replaced. In fact, servers providing key input and output channels may be outside the scope of our control -- so all we can do is treat them as external to the immutable device, and understand that hooking up the inputs and outputs needs to be part of the declarative script used to bring up any new version of our device.
The strongest example of this may be an enterprise database server. We typically have no intention of building a whole new database as part of our application build-and-deploy process. Databases are typically long-running and, fundamental to their purpose, they are extremely stateful, extremely mutable. Cloud services such as RDS make it easy to spin up new database instances, but often the contents of an enterprise database are too large or fast-changing to want to rehost -- or we just don’t have permission to do so. Instead, we leave it in place and accept that its contents are very mutable.
So even when we use immutability to make our application server clusters easier to upgrade and less prone to errors, we need to understand that they will almost always be connected up to other servers that lack this golden property. Automated deployment with modern devops, ideally with truly immutable servers, can ensure that your system looks just like the system that worked in test -- eliminating a lot of surprise downtime and deployment failures. But even at companies with modern devops, failures still happen -- and when they do, it’s because the test system was not exposed to the same kinds of inputs and outputs, and the same database state, as the production system. In a future blog post, we’ll look at some best practices for testing and quality control.
Have a look at your own software deployment practices. Are there too many (more than zero) manual steps involved in bringing up a new or updated server? Do you allow, or even require, sysadmins to make changes to servers that are already up and running? Maybe it's time to use automated, reproducible deployment, and make the move to immutable servers.
In addition to our customizable FP Deploy devops solution, FP Complete offers consulting services, from advice to hands-on engineering, to help companies migrate to modern devops.