Constant-time vinyl Field Getters

2018-02-13

The vinyl package offers a take on extensible records for Haskell with an emphasis on using lenses for manipulation. A representative use of their strengths is the relative ease with which one can work on a subset of fields of a large record. However, their implementation as a heterogeneously-typed list means that accessing a field takes an amount of time proportional to the field's position in the record. For small records, this is not significant, but we now offer a mechanism to package a record up in a value whose API is identical to that of classic vinyl but offers constant-time field access.

Fields with Names

To set the scene, let's take a quick review of basic vinyl records with named fields.

#! /usr/bin/env nix-shell
#! nix-shell -i runghc -p 'haskell.packages.ghc822.ghcWithPackages (ps: [ps.vinyl ps.microlens])'
{-# LANGUAGE DataKinds, FlexibleContexts, OverloadedLabels, TypeFamilies, TypeOperators #-}
import Data.Vinyl
import Lens.Micro

myRec :: Rec ElField '["name" ::: String, "number" ::: Int]
myRec = #name =: "Anthony" :& #number =: 23 :& RNil

incrementNumber :: HasField record "number" fs Int
                => record ElField fs -> record ElField fs
incrementNumber = rlensf #number %~ succ

main :: IO ()
main = print (incrementNumber myRec)

This code snippet (depending on vinyl >= 0.8) defines a record value, myRec, with two fields, and a function, incrementNumber, that can work with any record value of any record type as long as it has a field named "number".

Is vinyl slow?

It can be! For small records, list-like performance characteristics are fine, but if you have a record with more than half a dozen fields, you may notice that the speed of extracting fields depends on where in the record those fields are located.

If you have a collection of vinyl records, you may be able to employ a technique I have long relied upon: a vinyl record whose fields are Storable may be stored in a Data.Vector.Storable Vector, and accessing the fields of a vinyl record so stored is both constant-time and extremely fast. Since the record's type identifies the exact byte offset of each field, accessing a field is a pointer offset.

But not every type is Storable! A critical component of a Storable instance is a statically-known size (in bytes) of values of the type, and this is not known for exotic types like, say, lists.

Programmers who face challenges with the linear performance of list-like data structures often turn to arrays to gain predictable element access times. That is not to say that arrays are always preferable: adding an element to the head of a list is very fast, while adding elements to an array is harder to do efficiently. That said, the fast lookup time of an array element is a powerful thing, so it should be an option for vinyl users! Here is a port of the above example to include a usage of an array-like vinyl record:

#! /usr/bin/env nix-shell
#! nix-shell -i runghc -p 'haskell.packages.ghc822.ghcWithPackages (ps: [ps.vinyl ps.microlens])'
{-# LANGUAGE DataKinds, FlexibleContexts, OverloadedLabels, TypeFamilies, TypeOperators #-}
import Data.Vinyl
import Lens.Micro

myRec :: Rec ElField '["name" ::: String, "number" ::: Int]
myRec = #name =: "Anthony" :& #number =: 23 :& RNil

incrementNumber :: HasField record "number" fs Int
                => record ElField fs -> record ElField fs
incrementNumber = rlensf #number %~ succ

main :: IO ()
main = do print (incrementNumber myRec)
          print (incrementNumber (toARec myRec))

We didn't do anything to our existing code, but we added an application of

toARec :: NatToInt (RLength ts) => Rec f ts -> ARec f ts

whose type says that any traditional vinyl Rec of finite length can be converted to an array-like ARec. If you do encounter code that is not polymorphic in the record type (recall that our incrementNumber function is polymorphic in the record type), you can always use fromARec to convert an ARec into a Rec.

Is It Worth It?

For small records, you will likely see better performance with Rec than ARec, and if you are manipulating the fields of a record you will need to profile to determine if ARec can help you. In cases where you are primarily reading fields from large records, however, ARec scales very nicely. Here is a criterion plot of a simple field access benchmark (the X-axis is measured in nanoseconds).

vinyl-arec-accessors.png

Notice that accessing later fields of a Rec takes longer than accessing earlier fields (early and late refer to where the field's type appears in the type list parameter of your Rec type). Accessing a field of an ARec is comparable to accessing the fourth field of a Rec (on the computer that ran this benchmark with GHC-8.2.2), but independent of the field's position. Accessing a field of a Rec in a Storable.Vector is extremely fast (~420 picoseconds), and does not depend on the field's position.