Optionally Exposed
Know your weaknesses

<p class="date">2016-07-18</p>

I recently wrote some code to add an interesting new feature to hpack. I'm going to highlight some drawbacks to it to hopefully promote discussion.

hpack

In short, hpack is a program that generates a Haskell .cabal file from a YAML specification. This has the advantage of using a standard format, and presents an opportunity to change how Haskell package's are defined without touching the Cabal library or package specification language itself.

The Trouble With Orphan Instances

An orphan instance, in Haskell, is a type class instance that is defined neither alongside the class definition nor the definition of the data type to which the instance applies. Note that if an instance is defined in one of those two places (parent locations, if you will), there is no way it can overlap another instance in the dependency chain.

Alternately, if we allow instances to be defined anywhere, we run into situations where an instance of a particular class for a particular type is defined in multiple packages, and we have no principled way of selecting which one to use.

The Trouble With Avoiding Orphan Instances

If we outright ban orphan instances, we end up with heavy dependency chains. For instance, if you write a class for which a sensible instance exists for the Data.Text.Text type defined the text package, where does that instance go? If you add it to your package, every time somebody thinks to use your class, they incur a dependency on text. If you open a Pull Request to add the instance to text, it will be rejected because it would mean that every user of text now depends on your package.

Optionally Exposed

Continuing with our example, if I am writing some code that does use the text package, and I wish to use your class, I am not concerned about your package depending on text because I already depend upon it myself. On the other hand, if I don't depend on the text package myself, I can't possibly make use of the instance you defined for Text. In this case, a build tool giving me a transitive dependency on text only so that the instance you defined compiles is pointless.

The idea of trimming such contextually unusable instances from builds has made the rounds many times over the years, but I got a bee in my bonnet over a recent reddit post. My initial feeling was negatively hopeless because Cabal configuration flags have been a regular source of pain to me as a Cabal user. They are immediately unsatisfying because we specify a package's dependencies in the form packageName-x.y.z (for some version x.y.z), with nothing said about flag assignments. Perhaps we should depend on [packageName x.y.z +flagOne -flagTwo], but one ought to get the feeling that we are treading dangerously close to subverting the notion of version number with this somewhat overlapping, but more general, concept.

Commenters in the above-linked reddit thread made the point that the pruning of inaccessible instances can be entirely automatic. This means that there need be no room for bungled flag settings, or even visible flags at all. If you can't write it, you can't get it wrong.

This all works if the prune-able things introduce only weak dependencies. That is, a dependency that is not given to the build tool to satisfy, but whose satisfaction by other means acts as a guard for including a source module. We tot up all dependencies as normal, then in a single additional pass pick up any optionally exposed modules whose weak dependencies have been satisfied by the strong dependencies specified by the usual build-depends of all the packages we are building.

Sounds Great

It is great.

Really?

Great-ish. Here are some problems:

  • Optionally exposed parts of a package must live in their own source files. This makes fine grained optional exposure somewhat clunky. If you have 100 distinct optional parts of your package, you must have at least 100 distinct modules.
  • GHC's current definition of "orphan" insists that an instance be defined in the same file as the class or type definition. Relaxing this to demand that an instance be defined in the same package will be a boon for library authors regardless of this optional exposure feature as it will let us break up overly-large source files.
  • Build tools will have to track the provenance of a package rather than just its name and version number. Much like the way configuration flags break the notion that any compilation of a particular version of a particular package is the same as any other compilation, a build tool that wishes to take advantage of this feature must include a signature derived from the contents of the generated Cabal file and those of all dependencies in any token used to identify a compilation output. This is something Nix and related projects have done for years, and is generally a good idea (it means that a user can't as easily accidentally or maliciously forge a compilation artifact's identity by simply editing a version number).
  • Package build re-use will go down. Every time a different subset of optionally exposed pieces are selected for a build, new instances of a package and all its dependents will need to be compiled.

On Balance

Unsurprisingly, I think it's at least worth trying this out. The way the PR is currently set up is 100% backwards compatible. A build tool could even choose to only selectively take advantage of the available pruning. The benefits of being able to, say, give your package a module of features that depend on lens without demanding that your downstream users themselves depend on lens strike me as pretty enormous (n.b. optional exposures are not limited to addressing orphan instances). That those same users would – without needing to do a thing – automatically be able to use those lens-based features as soon as they start using lens themselves is ideal.

A build tool can even compare the build plans resulting from considering optional exposures vs not considering them, and make a choice. Note that the build tool silently including all optional exposures in order to maximize re-use is entirely transparent to the user.

The potential reduction of re-use is a tricky issue as it is hard to anticipate the degree to which it will impact typical users. Since this feature degrades to the status quo so gracefully, it is thankfully easy to make it either opt-in or opt-out, thereby avoiding any great calamity from leaping into the unknown. My conjecture is that people will tend to use a similar set of optional features across multiple projects in their own coding, and that the gain here will be realized in the form of faster builds of libraries that they do not use directly.