While exact definitions of Continuous Integration vary, the idea is that your software project is automatically built and tested very regularly, often many times per day. In practice, this usually means every time a developer pushes their code to a central repository, the CI process is performed. Ideally, this is performed on every branch, even those that are works in progress.
Advantages of continuous integration include:
- Regressions are caught quickly, so less time is spent tracking them down since you know which change introduced the bug.
- There is always a working "latest version", rather than finding out near release time that it turns out there are problems.
- Prevents non-working code from reaching the "mainline."
- Notifies developers immediately when code doesn't work.
- Automates deployment of your application so you always have a running test server with the latest code, or even deploy straight to production.
- Encourages developers to frequently share their work in progress.
- Encourages developers to write automated tests.
Source code management
We manage our code in Git repositories, although other source code management systems will work as well.
Centralized source code repository
Developers must regularly "push" their code from their workstations to a centralized repository. These can be managed solutions, or privately hosted within a company's network, depending on specific project needs. We most commonly use Github, Gitlab, and Atlassian Bitbucket.
The CI server "watches" for changes to the centralized repository and automatically executes scripts for each change, and provides notifications and reports based on whether those scripts succeed or fail. We regularly use:
- Travis CI: tightly integrated with Github, although it is a separate product operated by a different company
- Gitlab CI: built directly into Gitlab for a totally seamless experience when using that tool for code repositories and issue management
- Jenkins CI: one of the older CI servers, with a vast array of plugins that allow it to work with in conceivable environment, although tight integration can be tricky to configure
- Atlassian Bamboo: part of the Atlassian suite, and tightly integrated with their other tools like Bitbucket and JIRA but also usable independently
Features of CI server
While every CI server is different, they offer a similar array of features (some integrated directly, others via plugins).
The CI process can automatically deploy your project for testing. This is always a good idea for a test environment, and also used for automatically deploying to production from a protected branch.
Gitlab CI also supports "review apps", which will deploy a copy of the app for every branch, which is especially helpful for code reviews since the reviewer can work with a "live" version of the application built from the code that is under review, without first needing to build their own copy.
Builds can each be run in their own throw-away Docker containers. This means the build environments (e.g. operating system, compilers, system libraries) are defined by Docker images, and multiple projects with different requirements can be built on the same machine and in parallel without interfering with each other.
Parallel and distributed builds
CI jobs are performed by "agents" (also known as "slaves" or "runners"), and multiple agents can be connected to a single CI server or project. This allows setting up a cluster of machines to perform run builds.
Different projects or build stages within projects can be configured to require certain attributes in the agent where it runs. For example, a multi-platform project might have agents on Windows, Linux, and macOS, and run the build and tests on all three.
Notification integrates with your existing communications tools, such as e-mail and corporate chat services like Slack. Developers are notified of build failures, successful deployments, and many other events.
Various reports are generated that provide a snapshot of your projects' health, so you can see at a glance whether there are increased rates of regressions, and you can see exactly which version of code is running where.
The CI server manages secret values such as deployment credentials, so that the builds have access to them but they aren't revealed otherwise. Access to the credentials can be restricted so that only some build stages can access them, which lets you prevent "untrusted" stages from having access to the credentials.
CI systems let you specify build "artifacts" which are saved by a build stage and can then be accessed by later stages or outside processes. For example, you might have a "build" stage that builds an executable and saves it as an artifact, followed by "deploy" stage that retrieves the artifact and deploys the executable to a web server. These artifacts are saved so that you can go back to an old build and retrieve its the file(s).
One disadvantage of building in an ephemeral container is that build artifacts (e.g. intermediate files like object code) are lost between builds, which would result in unnecessarily long build times as the same code is built over and over. CI systems support specifying certain paths and files to cache between builds, so that those files will be restored in their original locations in the next build.
There is some overlap between caching and artifacts, but caching is meant for intermediate files to speed up subsequent builds, while artifacts are meant to save the final results of a build stage.
Continuous integration is a tool that can be wielded in many ways. These are some of the principles we follow to make the best use of it.
Avoid complexity in the CI configuration
It can be tempting to use every feature and plugin of a CI system to manage builds, but this is usually counterproductive. In general, let the CI system handle the "where and when" of building, but use your own scripts within the repository for the "how". This lets developers use the same scripts locally and makes it easier to switch to a different CI system in the future should that be desirable. It also means as much of the build process as possible is versioned along with the code which makes it easier to build older versions.
Write automated tests
While a CI system is still useful without automated tests, it really shines when an excellent suite of unit and integration tests is in place. You get quick feedback as soon as tests fail, and this kind of feedback encourages developers to write more tests. This also avoids code that fails tests from reaching the mainline.
Make builds and tests fast
If it takes too long to get feedback, it discourages regular use of the CI system. Features like caching, artifacts, and distributed builds mean you can avoid repeating unnecessary work and speed up builds. For larger and more complex projects this can be difficult, and sometimes it makes sense to split out more thorough integration tests into a separate nightly process so that the main tests return quickly.
Commit, merge, and push code regularly
Developers are encouraged to create feature branches for work in progress and push to them regularly. This gives feedback from the CI system regularly, and also makes collaboration easier.
Merging from the "mainline" branch to feature branches should be frequent to avoid complex conflict resolutions. CI systems can also be configured to run a feature branches build as though "mainline" was merged so that developers know as soon as potential conflicts and test failures arise, without having to perform the merge themselves first.