We're happy to announce that all users of Haskell packages can now securely download packages. As a tl;dr, here are the changes you need to make:
cabal update && cabal install stackage
stk update --verify --hashes
cabal install ...with
stk install ...
This takes advantage of the all-cabal-hashes repository, which contains cabal files that are modified to contain package hashes and sizes. The way we generate the all-cabal-hashes is interesting in its own right, but I won't shoehorn that discussion into this blog post. Wait for a separate blog post soon for a description of our lightweight architecture for this.
Note that this is an implementation of Mathieu's secure distribution proposal, with some details modified to work with the current state of our tooling (i.e., lack of package hash information from Hackage).
The all-cabal-hashes repository contains all of the cabal files Hackage knows
about. These cabal files are tweaked to have a few extra metadata fields,
including cryptographic hashes of the package tarball and the size of the
package, in bytes. (It also contains the same data in a JSON file, which is
what we currently use due to cabal issue
#2585.) There is also a tag on
current-hackage, which always points at the latest commit and is
GPG signed. (If you're wondering, we use a tag instead of just commit signing
since it's easier to verify a tag signature.)
When you run
stk update --verify --hashes, it fetches the latest content from
that repository, verifies the GPG signature, generates a
and places it in the same location that
cabal update would place it. At this
point, you have a verified package index on your location machine, which
contains cryptographic signatures and sizes for each package tarball.
Now, when you run
stk install ..., the stackage-install tool handles all
downloads for you (subject to some
caveats, like cabal issue
#2566). stackage-install will
look up all of the hashes and sizes that are present in your package index, and
verify them during download. In particular:
Only when the hash and size match does the file get written. In this way, tarballs are only made available to the rest of your build tools after they have been verified.
In mailing list discussions, some people were concerned about supporting Windows, in particular that Git and GPG may be difficult to install and configure on Windows. But as I shared on Google+ last week, MinGHC will now be shipping with both of those tools. I've tested things myself on Windows with the new versions of MinGHC, stackage-update, and stackage-install, and the instructions above worked without a hitch.
Of course, if others discover problems- either on Windows or elsewhere- please report them so they can be fixed.
In addition to the security benefits of this tool chain, there are also two other obvious benefits. By downloading the package index updates via Git, we are able to download only the differences since the last time we downloaded. This leads to less bandwidth usage and a quicker download.
This toolchain also replaces connections to Hackage with two high reliability services: Amazon S3 (which holds the package contents) and Github. Using off the shelf, widely used services in place of hosting everything ourself reduces our community burden and increases our ecosystem's reliability.
There are unfortunately still some caveats with this.
What's great about this toolchain is how shallow it is. All of the heavy lifting is handled by Git, GPG, Amazon S3, Github, and (as you'll see in a later blog post) Travis CI. We mostly just wrap around these high quality tools and services. Not only was this a practical decision (reduce development time and code burden), but also a security decision. Instead of creating a Haskell-only security and distribution framework, we're reusing the same components that are being tried and tested on a daily basis by the greater software community. While this doesn't guarantee the tooling we use is bug free, it does mean that the "many eyeballs" principle applies.
Using preexisting tools also means that we open up the possibility of use cases never before considered. For example, someone contacted me (anonymity preserved) about a use case where he wanted to be able to identify which version of Hackage was being used. Until now, such a concept didn't exist. With a Git-based package index, the Hackage version can be identified by its commit.
I'm sure others will come up with new and innovative tricks to pull off, and I look forward to hearing about them.