This blog post describes a new feature in stack. Until now, multiple projects using the same snapshot could share the binary builds of packages. However, two separate snapshots could not share the binary builds of their packages, even if they were substantially identical. That's now changing.

tl;dr: stack will now be able to install new snapshots much more quickly, with less disk space usage, than previously.

This has been a known shortcoming since stack was first released. It's not coincidental that this support is being added not long after a similar project completed for Cabal. Ryan Trinkle- Vishal's mentor on the project- described the work to me a few months back, and I decided to wait to see the outcome of the project before working on the feature in stack.

The improvements to Cabal here are superb, and I'm thrilled to see them happening. However, after reviewing and discussing with a few stack developers and users, I decided to implement a different approach that doesn't take advantage of the new Cabal changes. The reasons are:

stack's implementation

I came up with some complicated approaches to the problem, but ultimately a comment from Aaron Wolf rang true:

check the version differences and just copy compiled binaries from previous LTS for unchanged items

It turns out that this is really easy. The implementation ends up having two components:

  1. Whenever a snapshot package is built, write a precompiled cache file containing the filepaths of the library's .conf file (from inside the package database) and all of the executables installed.
  2. Before building a snapshot package, check for a precompiled cache file. If the file exists, copy over the executables and register the .conf file into the new snapshots database.

That precompiled cache file's path looks something like this:

/home/vagrant/.stack/precompiled/ghc-7.10.2/1.22.4.0/aeson-0.8.0.2/Vr6rCTNr+UeoWMN1qGJGhFfxIDSFqTgJixKuD6TtVEQ\=

This encodes the GHC version, Cabal version, package name, and package version. The last bit is a hash of all of the configuration information, including flags, GHC options, and dependencies. We then hash those flags and put them in the filepath, ensuring that when we look up a precompiled package, we're getting something that matches what we'd be building ourselves now.

The reason we can get away with this approach in stack is because of the invariants of a snapshot, namely: each snapshot has precisely one version of a package available, and therefore we have no need to deal with the new multi-instance installations GHC 7.10 supports. This also means no concern around views: a snapshot database is by its very nature a view.

Advantages

Downsides

After discussing with others: this trade-off seems acceptable: the overall decrease in disk space usage means that the desire to delete a single snapshot will be reduced. When real disk space reclaiming needs to happen, the recommended approach will be to wipe all snapshots and start over, which (1) will be an infrequent occurrence, and (2) due to the faster compile times, will be less burdensome.

Do you like this blog post and need help with industrial Haskell, Rust or DevOps? Contact us.

Share this