Postmortem: Cabbage

<p class="date">2016-05-25</p>

Some time ago, I made the traditional bad decision of working on my own build tool. In this case, I was interested in more efficiently sharing compilation artifacts across a dozen or so Haskell projects between two development machines without sacrificing isolation between those projects such that working on one would never break another.

1 Background

cabal sandboxes are a mechanism for building a Haskell package against a particular set of dependencies independent from those other packages may be built against in other sandboxes. While each sandbox contains at most one version of any package, different sandboxes may contain different versions of a particular package.

The isolation provided by this system is nice in that it lets you try out a new version of an upstream package with one of your projects without affecting another project you may be working on using the same computer. It is not so nice in that every dependency must be installed for every project you work on: no sharing!

Nix is a package manager (associated with, but independent of, the NixOS Linux-based operating system) that is very capable of maintaining a coherent system wherein multiple versions of a package are concurrently installed. The core idea is that every package is identified by its source and that of all the dependencies used to build it. In this way, we can not only have multiple versions of a package of a particular name, we can have multiple instances of the same version of a package built against different dependencies.

Stackage is an approach to Haskell tooling that defines a versioned package set. The idea here is that you have a big set of packages wherein each version of the set contains at most one version of any one package. Specifically, all the packages in the set can build against each other: if your dependencies are in the set, everything will build. The advantage of this approach is that since everything is built against the same set of dependencies, you can share the build artifacts of those dependencies between projects. No conflicts, and great sharing.

2 Problem

My instinct was to resist Stackage because I was put off a bit by the centralization aspect. What irks me about defining sets of "blessed" packages is that I worry that anything not in the blessed set becomes second class. That is a recipe for a controlling clique to exert undue influence, and eventually hold back an ecosystem.

That said, re-building the exact same package several times over in separate sandboxes was vaguely infuriating. I wanted the isolation of sandboxes, the freedom to use whatever version of any package I wanted, and sharing between projects.

The Nix package collection (nixpkgs) was a big turn off for me because it included a single version of every Haskell package, and it was a rolling release, so you essentially had to live at the cutting edge of package releases. This was a big let down for me because Nix is so well suited to making every version of every package available on your system.

3 Hypothesis

I could use the dependency version solver built in to cabal-install to find a set of versions, then use Nix to manage a store where I put every build artifact, then build my own cabal sandboxes by symlinking from the Nix store into the sandbox.

4 Experiment

I wrote the cabbage build tool that consists of rather a lot of bash scripting to implement the idea.

Good news everybody, it works!

5 Results

The bad news is more extensive:

No Version Solver Library

We first take into account a top-level .cabal file and flag assignments for all dependencies, and run the cabal solver to produce a build plan. We then use this build plan as a set of constraints for the solver to find a build plan for every dependency. This re-solving is, unfortunately, slow. It would be much better to invoke the solver in a batch mode so that the constraints generated by the top-level solution are kept in memory, and the relevant subsets of the induced constraints can simply be extracted to define the build plans for each dependency.

The upshot is that we get a per-package solution that is consistent with the constraints required by the top-level plan. This per-package solution defines an identifier that we can look for in the store. This means that not only can we have multiple versions of a package in our store of compiled artifacts, we can have multiple instances of the same versions of a package compiled with different flags or against different dependencies. We get maximum – totally safe – sharing between projects.

The Mighty Hackage Churn

Virtually every Haskell project ends up depending on over 100 packages, and, odds are, one of them released a new version this week. This issue is exacerbated by the rapid changes in packages that have spawned their own long reverse dependency chains; a notable example being lens. Every time a new version of lens is released, many useful packages need rebuilding.

Bear in mind, this is not rebuilding in a working project, but for new ones or those you might be updating. The problem is that, for a new project or one you want to rebuild from scratch, you would run the cabal solver against today's state of hackage. Simply copying the dependency solution from another project doesn't address this, as there is only partial overlap between desired solutions.

6 Nix + Stackage

What is needed is a way to refer to a dependency solution across a broader set of packages than any one project. That way, the one dependency solution can be reused across projects. And this is precisely what Stackage is. While the stack build tool is the user-facing part of this, Stackage itself is an admirably simple execution of establishing a manageable cadence of hackage evolution.

The central idea is that, for most purposes, we can just bundle up changes to all our libraries, and periodically move the entire ecosystem forward en masse. This offers a convenient way to refer to states of the ecosystem, e.g. LTS-2.1 from early 2015.

As for centralization, I can't fault how the project is run: instructions for adding your package are on the front page, and are almost disconcertingly easy to follow. What you get out of this is an easier life for users of your package, and a message when your package doesn't type check against the rest of the package set, or when its test suite fails when built against the package set. This also means that packages that depend on yours are suddenly part of your own test suite! If a reverse dependency depends upon a part of your library that you didn't test, the downstream breakage – caught when moving the entire ecosystem forward – will eventually come to your attention.

7 The Happy Ending

The rolling releases were too turbulent, so cabbage gained the ability to begin with a set of constraints provided by Stackage. At this dim point at which the repeated invocations of the constraint solver just to extract subsets of the Stackage package set was not really tolerable, the nixpkgs project started adopting Stackage LTS releases in its definitions (e.g. LTS-2.1, aka pkgs.haskell.packages.lts-2_1). This means that now you can use nix as a package manager for your Haskell and non-Haskell dependencies, have sharing between all your software needs, and avoid the uncomfortable buffeting of the vigorously thriving Haskell ecosystem.