The most common pattern for using Docker to build and deploy software in an image uses a single Dockerfile to build the software and produce the image that gets deployed. The basic pattern goes:
FROM base-image RUN install-some-extra-build-tools COPY . /build-directory RUN /build-directory/build-my-software CMD /run/my/software
This works, but you end up with a great deal of unncessary cruft in the image that gets deployed. Does your software need its own source code and the tools used to build itself in order to run? Unless you're using an interpreted language like Ruby or Python, it probably doesn't, so why does it have to be in the deployed image? Disadvantages of this approach include:
Compilers are often huge (likely to be much bigger than your own software), which means the majority of the deployed image's contents are not used. That's a lot of pointless overhead.
Since your source code and vendor libraries are in the image, a security hole in your software could leak proprietary information.
Every extra component introduces a potential attack vector.
Instead, we should separate concerns: use one Dockerfile to create a build environment and use that to build our software, and another to create the deployed runtime image using the artifacts generated by the build. What follows is a simple example that using one-line "Hello, world" program written in Haskell (our preferred language, of course, but also illustrative since the compiler is not generally considered small). The full example is available on Github.
The conventional approach
What do we need to build this program? Let's just do the obvious: use the official haskell image. We'll start with the "conventional" approach, and work toward something better. Here's the Dockerfile:
FROM haskell:7.10.2 # [insert additional build and runtime requirements here] RUN mkdir /artifacts COPY src /src/ RUN ghc -o /artifacts/hello /src/Main.hs CMD /artifacts/hello
We build the image using
docker build -t haskell-hello ., and run it:
$ docker run --rm haskell-hello Hello, world
Great, all done and ready to deploy! So how big is the image?
$ docker inspect -f '' haskell-hello 715052740
It's ~700 MB, just to run a "Hello, world" program! There must be a better way.
The split-image approach
What do we need in the image to actually run this tiny program? Not very much at
all; just a minimal Linux system with the libgmp shared library (which all
programs compiled with GHC need unless special options are used). Conveniently,
there is the ~4 MB
haskell-scratch image for that
Haskell Web Server in a 5MB Docker Image
blog post, but note that it's too minimal for most real-world Haskell programs
and suggest using something like
instead). Here's the runtime image's
run/ subdirectory) to create the runtime image that we'll deploy:
FROM fpco/haskell-scratch:integer-gmp # [insert additional runtime requirements here] COPY artifacts /artifacts/ CMD /artifacts/hello
Where does the contents of the
artifacts directory come from? That's the job
of the build image's
FROM haskell:7.10.2 # [insert additional build requirements here] VOLUME /artifacts VOLUME /src CMD ghc -o /artifacts/hello /src/Main.hs
This uses the same official Haskell image and compilation command as our original
Dockerfile, but it uses VOLUME mounts and the
CMD instruction instead. That means the
source code is compiled when you
docker run the image, not when you
build it. That, in turn, allows us to use VOLUME mounts (which cannot be used
docker build) to expose the host's
run/artifacts directory to the
build, so that it puts the artifacts where the runtime image's Dockerfile looks
for them. To put it all together, run
$ docker build -t build_haskell-hello build/ $ docker run --rm \ --volume="$PWD/build/src:/src" \ --volume="$PWD/run/artifacts:/artifacts" \ build_haskell-hello $ docker build -t haskell-hello run/
Notice that we also mounted the source code from the host. While we could have continued COPYing the source code into the image, mounting it has some advantages. You don't end up with a bunch of large build images full of intermediate files for every time you change the code (that you have to remember to clean up), and you can do incremental builds since intermediate files are preserved.
That was more complicated, but did it make a difference? Well...
$ docker inspect -f '' haskell-hello 5466526
Down to ~5.5 MB, over two orders of magnitude better. I'd say that was worth it!
Of course, everyone's projects are different, and require different trade-offs,
so while the above is illustrative of the approach, you will tweak it as you see
fit. You may prefer to COPY the source code into the image to minimize risk of
leakage between iterations (at the expense of time and disk space). You may want
to clear the
artifacts directory between builds for the same reason. For more
complex project, there will be OS requirements shared between the build and
runtime images, so it often makes sense to derive both from a common parent.
At FP Complete, we use and recommend this approach for deploying production
software with Docker, but without easy-to-use tool support it is a bit
cumbersome. Unsurprisingly, Stack has excellent
support for this approach. The
stack image container command will create a
runtime image from artifacts generated during the build (optionally using Docker
for the build as well). See
Yesod hosting with Docker and Kubernetes
for an example, and
Docker section of the user's guide
for more details.