Haskell and OpenCV: Envisioning
March 14, 2016: Reformatting
Some time ago I forked Noam Lewis's HOpenCV bindings to the fine OpenCV library to fill them out with pieces I needed for several projects at work, and to experiment with how such bindings could be used. Over time I've built up some useful components, and, in a fit of non-procrastination, I've recently pushed many updates and assembled a fun demo program.
To try things out, grab the code and
cabal install this alternate
universe version of HOpenCV. I'm not going to go into too much detail
here about how the library works, but performance deserves a mention,
as does some explication of usage.
The first thing that strikes the inquisitive Haskeller when attempting
to work with a native library such as OpenCV through the FFI is that
image processing feels rather functional: we process an input image
to produce an output image. Now, in practice, much OpenCV code is
visibly caught between a naming rock and a naming hard
place. Supposing an
Image type in C, one sees things like,
... cvCvtColor(rgbImg, grayImg, CV_RGB2GRAY); cvThreshold(grayImg, grayImg, 100, 255, CV_THRESH_BINARY); ...
The programmer coverts an RGB image to a grayscale image, then
thresholds the grayscale image before moving on. An issue one might
have with this code is the choice of names:
grayImg is a fine choice
of name for a grayscale version of
rgbImg, but using the same name
for the thresholded version? That seems less fine. Note that the
re-use of the same image value is perfectly reasonable as the
intermediate value is not used, and the allocation savings can add
up. To avoid having names that are either inappropriate for what they
end up bound to, or are inappropriate for what they start out bound
to, an alternative is to name all such re-used images something like
tempImg. This alternative has the advantage of avoiding misleadingly
suggestive names, at the expense of cluttering code up with tokens
that are devoid of content for the reader.
If you are coming at this from a functional programming point of view,
you want to write the active part of this little snippet as,
cvThreshold 100 255 . cvCvtColor CV_RGB2GRAY.
To cut a long story short(er), this is how you string together operations in the version of HOpenCV you built from my github repository. Before we get to the literate Haskell part of this post, let's take one quick diversion into the FFI and in-place updates piece.
In order to avoid restricting every use of OpenCV to the
IO monad, one can maintain referential transparency by
never performing in-place updates, and wrapping all the calls into
unsafePerformIO. The first part of this course of
action is effected by always allocating a fresh image for any OpenCV
operation to use as a destination. The two parts together are made
somewhat prettier by relying on a small set of core combinators for
wrapping calls into OpenCV. One of the combinators I use is called
cv results in compositions that look like,
cv f . cv g . cv h.
The great part of function composition like this is that those
hard-to-name intermediate values are out of our hair: the
. solved our naming problem!
Remember that, in order to make OpenCV calls safe, we duplicate input
before operating on the duplicate in-place. This means that the
example composition is actually more like,
cv f dup . cv g
dup . cv h dup where
dup clones the input value, and
cv calls the OpenCV function, e.g.
f, passing it the
cloned value to be mutated in-place. What we can do instead is strike
up a profitable conversation with the amazing programmable inliner GHC
comes with to have this kind of computational pipeline rewritten to
the moral equivalent of,
cv (f.g.h) dup.
Et voila! We have a functional interface to OpenCV that pays a bare minimum in allocation to maintain purity. We can lift sub-expressions out of compound expressions and not have to worry about varying semantics.
So lets see about tamping down some constant factors of our processing
code! The VideoFunhouse example application includes an effect that
turns an image into a kind of blue print using four shades of blue,
and thick edge tracing. This involves making quite a few passes over
the input pixel data to threshold, mask, find edges, etc. On an older
Core 2 Duo 2.33 GHz MacBook Pro, processing a 640x480 image takes 34.5
milliseconds (ms). Not bad? If we bump up the RTS's allocation size
+RTS -A8M to reflect that we're working with large-ish arrays
(images), we knock that time down to 30.1ms. This is a dual core
processor, so let's make sure GHC uses the cores with
+RTS -N. This
gets us to 27.7ms.
Now we'll try out two different options, one at a time. If we enable
the rewrite rules described above for fusing operation compositions
for in-place updates, we get performance all the way down to 23.9 ms.
Alternately, we can use Control.Parallel lightweight
pseq annotations to suggest to the RTS that it
evaluate the two main components of the image processing effect in
parallel. This option moves us from 27.7ms down to 19.9ms.
If we use
par and the rewrite rules? 15.8ms per
frame. This is more than twice as fast as where we started. We get a
21% improvement by mutating in-place whenever we're sure it is safe to
do so. We are also getting a 34% speedup from the
annotations. The program used to measure the performance of these
options using Criterion is here.
The example programs are browsable via the above links, but here is an example standalone program that runs Canny edge detection on a live webcam feed and shows the results in a window.
import AI.CV.OpenCV.HighCV main = createCameraCapture (Just 0) >>= runWindow . fmap proc where proc = canny 70 110 3 . convertRGBToGray
I hope this library may be of some help to anyone interested in computer vision and Haskell. Moving forward, the biggest issue is how little of OpenCV is wrapped by this library, while the next only slightly smaller issue is that the code shows many obvious signs of having gone through several generations. The breadth of the OpenCV API will likely require automatically generated bindings, but making the Haskell side of the interface takes some thought, so I am not sure what the best way to reasonable API coverage is.