Casa stands for "content-addressable storage archive", and also means "home" in romance languages, and it is an online service we're announcing to store packages in a content-addressable way.
It's the natural next step in our general direction towards
reproducible builds and immutable infrastructure. Its first
application is use in the most popular Haskell build tool,
master branch of this tool is now
download its package indexes, metadata and content from this service.
Although its primary use case was for Haskell, it could easily apply to other languages, such as Rust's Cargo package manager. This post will focus on Casa in general. Next week, we'll dive into its implications for Haskell build tooling.
CAS is primarily an addressing system:
Because the SHA256 refers to only this piece of content, you can validate that what you get out is what you put in originally. The logic goes something like:
content, check that sha256sum(content) =
This is how Casa works. Other popular systems that use this style of addressing are IPFS and, of course, Git.
There is one simple download entry point to the service.
https://casa.fpcomplete.com/<your key>-- to easily grab the content of a key with curl. This doesn't have an API version associated with it, because it will only ever accept a key and return a blob.
These two are versioned because they accept and return JSON/binary formats that may change in the future:
https://casa.fpcomplete.com/v1/metadata/<your key>-- to display metadata about a value.
https://casa.fpcomplete.com/v1/pull- we POST up to a thousand key-len pairs in binary format (32 bytes for the key, 8 bytes for the length) and the server will stream all the contents back to the client in key-content pairs.
Beyond 1000 keys, the client must make separate requests for the next 1000, etc. This is due to request length limits intentionally applied to the server for protection.
Upload is protected under the endpoint
/v1/push. This is similar to
the pull format, but sends length-content pairs instead. The server
streamingly inserts these into the database.
The current workflow here is that the operator of the archive sets up a regular push system which accesses casa on a separate port which is not publicly exposed. In the Haskell case, we pull from Stackage and Hackage (two Haskell package repositories) every 15 minutes, and push content to Casa.
Furthermore, rather than uploading packages as tarballs, we instead upload individual files. With this approach, we remove a tonne of duplication on the server. Most new package uploads change only a few files, and yet an upgrading user has to download the whole package all over again.
Here are some advantages of using CAS for package data:
Recalling the fact that each unique blob is a file from a package, a cabal file, a snapshot, or a tree rendered to a binary blob, that removes a lot of redundancy. The storage requirements for Casa are trivial. There are currently around 1,000,000 unique blobs (with the largest file at 46MB). Rather than growing linearly with respect to the number of uploaded package versions, we grow linearly with respect to unique files.
Companies often run their own package archive on their own network (or IP-limited public server) and upload their custom packages to it, to be used by everyone in the company.
Here are some reasons you might want to do that:
You can do the same with Casa.
The Casa repository is here which includes both the server and a binary for uploading and querying blobs.
In the future we will include in the Casa server a trivial way to support mirroring, by querying keys on-demand from other Casa servers (including the main one run by us).
Here's what we've brought to the table with Casa:
We believe this CAS architecture has use in other language ecosystems, not just Haskell. If you're a company interested in running your own Casa server, and/or updating your tooling, e.g. Cargo, to use this service, please contact us.
Do you like this blog post and need help with industrial Haskell, Rust or DevOps? Contact us.