Archive

Author Archive

Fast Factory, Video 1

April 13th, 2013 No comments

I spent the past few months developing the perception system behind this video of a PR2 grasping moving objects.

Perception is driven entirely by the Kinect mounted on the robot’s head, and all processing of the Kinect data is done in Haskell. I will write up a report on that side of things in the near future, but suffice to say that Haskell is more than ready for top flight performance and reliability (we’ve done about a month of testing, where a test day generally involves launching the perception software, then letting it run all day). It’s multithreaded, networked, and GPU accelerated. Free monads were involved. Lenses were involved. Burritos played a crucial role.

GLUtil on Hackage

August 31st, 2012 Comments off

The GLUtil Haskell package I wrote about before is now available on hackage.

Categories: Uncategorized Tags:

Introducing CLUtil

December 22nd, 2011 Comments off


Edited: May 4, 2013

OpenCL is a cross-platform parallel programming standard with support for execution on both CPUs and GPUs. The OpenCL package on hackage provides a direct binding to the API with just enough Haskellosity to make invoking those API functions borderline pleasant. That said, there remains a certain amount of boilerplate that is rather offputting.

For context, we are trying to write program fragments that may be executed in parallel to improve efficiency. Multi-core CPUs are good at this, GPUs can be even better. We can bend GLSL to our needs (see my modern OpenGL with GLSL in Haskell tutorial), but are then left to contend with various graphics and GUI considerations that are completely irrelevant to our goals. OpenCL represents a much more familiar looking programming environment (a kind of C with support for small vectors), with less setup overhead than a GPGPU program using GLSL on the backend.

So let’s get started: How can we use OpenCL from a Haskell program?

First we’ll write a simple OpenCL kernel that adds two vectors, a and b, and stores the element-wise square of the result in c. To make it slightly interesting we’ll operate on four floating point numbers at a time.

__kernel void vecAdd(__global float *a, 
                     __global float *b,
                     __global float *c) {
  size_t i = get_global_id(0);
  float4 r = vload4(i,a) + vload4(i,b);
  vstore4(r*r,i,c);
}

 
I’ve saved this code in a file called VecEZ.cl.

To recap: we have an OpenCL kernel that takes two read-only input vectors and a write-only output vector. This sort of configuration fits rather well into a functional setting, so it is where we will focus our efforts as we endeavour to make the common things easy. Now we’re ready for some Haskell:

> import Control.Parallel.CLUtil
> import qualified Data.Vector.Storable as V

 
I’ve imported my CLUtil library that provides some simple helpers for common Haskell-OpenCL operations. Now let’s write a program that initializes OpenCL, loads and compiles an OpenCL kernel stored in a separate file, and run that kernel on some data.

> test1 = do s <- ezInit CL_DEVICE_TYPE_CPU
>            k <- kernelFromFile s "VecEZ.cl" "vecAdd"
>            let v1 = V.fromList [1,2,3,4::Float]
>                v2 = V.fromList [5,6,7,8::Float]
>            v3 <- runKernel s k v1 v2 (Out 4) (Work1D 1)
>            print (v3::Vector Float)

 
And that’s it! The ezInit function intializes an OpenCL device and sets up a context and queue, all of which are returned in a record. The kernelFromFile function reads OpenCL source code from a file, then builds the named kernel using the previously initialized OpenCL state record. Finally, we produce some data in Haskell and call runKernel.

The runKernel action is variadic in an attempt to meet most of your kernel-running needs. Its arguments are an OpenCL state record, a kernel to run, the parameters to be passed to the OpenCL kernel, and the number of global work items (i.e. how many times to invoke the kernel). Recall that our kernel had three parameters, and operated on four elements at once. So, we pass the kernel the two input vectors, a note requesting an output vector of length four, and a specification that our kernel should be run exactly once. We bind the output to another vector, and provide a type annotation on the next line to let runKernel figure out how much memory to allocate for the output vector. This is potentially confusing, hopefully more examples will clarify usage.

Next, let’s consider another kernel that demonstrates a bit more variety in the types.

__kernel void funnyBusiness(__global double *a,
                            __global double *b,
                            __global double *c,
                            __global int *s) {
  size_t i = get_global_id(0);
  double4 r = vload4(i,a) + vload4(i,b);
  double4 rsq = r * r;
  vstore4(rsq, i, c);
  s[i] = (int)(rsq.x + rsq.y + rsq.z + rsq.w);
}

 
This kernel is similar to the first, but uses the double scalar type, and also takes an int pointer as an argument that has an int cast of the horizontal sum of the result vector written to it. The Haskell side looks very similar to the last time,

> test2 = do s <- ezInit CL_DEVICE_TYPE_CPU
>            k <- kernelFromFile s "VecEZ.cl" "funnyBusiness"
>            let v1 = V.enumFromN  0 12 :: Vector Double
>                v2 = V.enumFromN 12 12 :: Vector Double
>            (v3,v4) <- runKernel s k v1 v2
>                                 (Out 12) -- v3 is a 12-element output
>                                 (Out 3)  -- v4 is a 3-element output
>                                 (Work1D $ 12 `div` 4)
>            print $ V.sum (v3::Vector Double)
>            print $ V.sum (v4::Vector CInt)

 
A few things to note:

  • v3 and v4 have different types. The runKernel action always returns Vectors, be they individual as in test1, or tupled as here. The CLUtil library currently supports up to three output vectors, but this is an arbitrary limit that will be raised as the library matures.

  • Once again we operate on four input elements at a time. Given that our source vectors have twelve elements each, we must run three work items.

  • The type annotation on v4 is CInt rather than Int. This is important as the size of Haskell’s Int may be 32 or 64 bits, while OpenCL’s int is always 32 bits.

One last bit of mystery that remains is the Work1D specification. OpenCL supports up to three dimensions of work items, which can ease the handling of multi-dimensional data. If we were adding a matrix, for example, we might have the elements of the matrices stored in Vectors, but the elements are intuitively addressed by a pair of coordinates. A naive OpenCL kernel for adding matrices might go something like this,

__kernel void floaty(__global float *a, 
                     __global float *b,
                     __global float *c) {
  size_t x = get_global_id(0);
  size_t y = get_global_id(1);
  size_t i = y * get_global_size(1) + x;
  c[i] = a[i] + b[i];
}

 
On the OpenCL side, we’re not gaining anything by working with two dimensions of work item indices, but sometimes these coordinates can inform coordinate-dependent logic in the OpenCL program.

The Haskell side is once again very similar, with the exception of the new Work2D specification.

> test3 = do s <- ezInit CL_DEVICE_TYPE_GPU
>            k <- kernelFromFile s "VecEZ2.cl" "floaty"
>            let v1 = V.enumFromN  0 16 :: Vector Float
>                v2 = V.enumFromN 16 16 :: Vector Float
>            v3 <- runKernel s k v1 v2 (Out 16) (Work2D 4 4)
>                  :: IO (Vector Float)
>            mapM_ (\i -> print (V.slice i 4 v3)) [0,4..12]

 
Note that here we explicitly request a GPU to run our program. The GPU in my laptop supports floats, but not doubles, so I put this kernel in a separate file from the others.

Hopefully usage of CLUtil is becoming clear. So let’s crank things up and take advantage of the Haskell ecosystem.

A blog post describing quasicrystal figures serves as an interesting test of OpenCL performance. As a first pass, we can port the pixel computations from that post to OpenCL without thinking about optimization…

float wave(float,float,float,float);
float wrap(float);

__kernel void quasiCrystal(int pixels, float scale, float phase, int numAngles, __constant float *angles, __global uchar *img) { int x = get_global_id(0); int y = get_global_id(1); float denom = (float)pixels - 1; float u = scale * ((2 * (float)x / denom) - 1); float v = scale * ((2 * (float)y / denom) - 1); float sum = 0.0f; for(int i = 0; i < numAngles; ++i) { sum += wave(phase, angles[i], u, v); } uchar r = (uchar)(255.0f * clamp(wrap(sum), 0.0f, 1.0f)); vstore4((uchar4)(255,r,128,r), y*pixels + x, img); }
float wrap(float n) { int k = (int)n; if(n < 0) k -= 1; float v = n - (float)k; if(k%2 == 1) v = 1 - v; return v; }
float wave(float phase, float theta, float x, float y) { return (cos(cos(theta) * x + sin(theta)*y + phase) + 1) / 2; }

 
…then write the control logic in Haskell. We also tie our image generation into the gloss1 library for a real-time display.

> import Control.Parallel.CLUtil
> import qualified Data.Vector.Storable as V
> import Data.Word (Word8)
> import Graphics.Gloss hiding (Vector, scale)
> import Graphics.Gloss.Interface.IO.Animate hiding (Vector, scale)
> 
> pixels :: Int
> pixels = 800
> 
> scale :: Float
> scale = 128
> 
> angles :: Int -> Vector Float
> angles n = V.map (* (pi / fromIntegral n)) $ V.enumFromN 0 n
> 
> mkPicture :: Vector Word8 -> Picture
> mkPicture = flip (bitmapOfForeignPtr pixels pixels) False
>           . (\(x,_) -> x)
>           . V.unsafeToForeignPtr0
> 
> main = do s <- ezInit CL_DEVICE_TYPE_ALL
>           k <- kernelFromFile s "QuasiCrystalRGBA.cl" "quasiCrystal"
>           let numPix = pixels * pixels
>               pixels' = fromIntegral pixels :: CInt
>               numAngles = 7
>               allAngles = angles numAngles
>               frame :: Float -> IO Picture
>               frame phase = mkPicture `fmap`
>                             runKernel s k pixels' scale phase
>                                       numAngles allAngles
>                                       (Out (numPix*4)) (Work2D pixels pixels)
>           animateIO (InWindow "Quasicrystal" (pixels,pixels) (0,0)) black frame

 
On my laptop, running on the CPU consumes 300% of a dual core i5 (with hyperthreading), while running on the GPU consumes 85% CPU. While this example is somewhat more complicated, the ability to switch between CPU and GPU computation driving a visualization, all from an interactive interpreter is highly compelling. The animation runs at a reasonable clip on a dual core CPU, so try it out yourself! It should look something like this,

Quasicrystal animation screenshot

Quasicrystal animation screenshot

These programs may all be found in the examples directory of the CLUtil package.

P.S. When building executables on your system, you may need to supply GHC with an -lOpenCL option. This is not necessary in Mac OS X, where we use the installed OpenCL framework, but is likely needed elsewhere.


  1. Visit the gloss homepage to get the latest version via darcs. You may need to manually relax version constraints on OpenGL and GLUT to install gloss with the latest versions of those packages. Note also that on Mac OS X, installing gloss with the GLFW-b backend (cabal install gloss --flags="-GLUT GLFW") allows you to start and stop the visualization from an interactive GHCi session (see the FAQ on the gloss page for further instructions). ?

Categories: Programming Tags:

Haskell Genetic Algorithm “Hello, world!”

April 15th, 2011 2 comments

A post on reddit linked to several implementations of a cute "Hello, world!" program demonstrating a genetic algorithm that evolves towards a target string. Example programs were written in several languages, and I thought a Haskell version could be worthwhile as it demonstrates the use of random numbers, an issue that frustrates many newcomers to the language.

I use the MonadRandom package to deal with plumbing a random number generator through the code, but otherwise stick to lists and the usual cast of characters from the Prelude.

> {-# LANGUAGE NamedFieldPuns, TupleSections #-}
> import Control.Applicative
> import Control.Arrow (second)
> import Control.Monad (liftM, replicateM)
> import Control.Monad.Random
> import Data.Function (on)
> import Data.List (minimumBy, sortBy, nub, (\\))
> import Data.Ord (comparing)
> import Text.Printf (printf)

 
The example programs associated with the original post all come with tests. Some of these are standard unit tests that verify hand-picked examples, while others verify properties of random processes. To support both kinds of tests, we use HUnit for unit tests, and QuickCheck for property testing.

> import Test.QuickCheck
> import Test.QuickCheck.Monadic
> import Test.HUnit hiding (assert)

 
For clarity, we create a type alias for the raw data we will be working with.

> type Gene = String
>
> target :: Gene
> target = "Hello, world!"

 
Next are the functions for working with individual Genes.

> mate :: RandomGen g => Gene -> Gene -> Rand g Gene
> mate g1 g2 = (++) <$> flip take g1 <*> flip drop g2 <$> pivot
> where pivot = getRandomR (0, length g1 - 1)
>
> mutate :: RandomGen g => Gene -> Rand g Gene
> mutate g = (uncurry (++) .) . second . (:) <$> delta <*> parts
> where delta = getRandomR (' ', 'z')
> idx = getRandomR (0, length g - 1)
> parts = second tail . flip splitAt g <$> idx
>
> fitness :: Gene -> Int
> fitness = sum . map abs . zipWith ((-) `on` fromEnum) target

 
We also want a random Gene generator.

> randomGene :: RandomGen g => Rand g Gene
> randomGene = replicateM (length target) $ getRandomR (' ', 'z')

 
Finally we can start thinking about populations. The example Python program I looked at had several parameters for initializing the population and controlling how it evolves. We capture those parameters in a record.

> data PopInfo = PopInfo { size      :: Int
> , crossover :: Float
> , elitism :: Float
> , mutation :: Float }

 
A Population is a pair of a record describing the population and a collection of Genes.

> type Population = (PopInfo, [Gene])
>
> defaultPop :: PopInfo
> defaultPop = PopInfo 1024 0.8 0.1 0.03

 
We will use a helper function to produce a randomized initial population.

> randomPop :: RandomGen g => PopInfo -> Rand g Population
> randomPop = liftA2 (,) <$> pure <*> flip replicateM randomGene . size

 
A tournament selection method is used to select parent Genes from a Population.

> tournamentSize :: Int
> tournamentSize = 3
>
> tournamentSelection :: RandomGen g => Population -> Rand g Gene
> tournamentSelection (info, genes) =
> minimumBy (comparing fitness) . map (genes !!) <$>
> replicateM tournamentSize (getRandomR (0, size info - 1))
>
> twoM :: Monad m => m a -> m (a, a)
> twoM = liftM (\[x,y] -> (x,y)) . replicateM 2
>
> selectParents :: RandomGen g => Population -> Rand g (Gene, Gene)
> selectParents = twoM . tournamentSelection

 
The meat of the algorithm is the evolve function that, appropriately enough, evolves a Population.

> evolve :: RandomGen g => Population -> Rand g Population
> evolve p@(info@(PopInfo {size, crossover, elitism, mutation}), genes) =
> (info,) . sortBy (comparing fitness) . (take idx genes ++) <$>
> replicateM (size - idx) (twoM getRandom >>= go)
> where idx = round (fromIntegral size * elitism)
> go (r1,r2) | r1 <= crossover =
> selectParents p >>= uncurry mate >>= addChild r2
> | otherwise = addMutation r2
> addChild r c
> | r <= mutation = mutate c
> | otherwise = return c
> addMutation r
> | r <= mutation = mutate . (genes !!) =<< getRandomR (idx, size - 1)
> | otherwise = (genes !!) <$> getRandomR (idx, size - 1)

 
Now we’re ready to kick off a multi-generation quest for the string we already wrote down up top.

> iterateUntil :: Monad m => (a -> Bool) -> (a -> m a) -> a -> m a
> iterateUntil stop f = go
> where go x | stop x = return x
> | otherwise = f x >>= go
>
> maxGenerations :: Int
> maxGenerations = 16384
>
> main = evalRandIO (randomPop defaultPop >>= iterateUntil done step . (, 0))
> >>= result
> where step (p,gen) = (,) <$> evolve p <*> pure (gen+1)
> done ((_, g:_), generation) =
> generation == maxGenerations || fitness g == 0
> result ((_, g:_), generation)
> | generation == maxGenerations =
> putStrLn "Maximum generations reached without success."
> | fitness g == 0 = printf "Reached target (%d): %s\n" generation g
> | otherwise = putStrLn "Evolution is hard. Let's go shopping."

 
In keeping with the spirit of the original programs, we include tests to exercise some of the component functionality we’ve put together.

> testGen = run (evalRandIO randomGene) >>= assert . check
> where check g = and $ map ($ g) [ (>= 0) . fitness
> , (== 13) . length
> , all (between 32 122 . fromEnum) ]
> between l r x = l <= x && x <= r
>
> testMut = run (evalRandIO $ randomGene >>= pairWithMutant) >>= assert . check
> where pairWithMutant = liftA2 (,) <$> pure <*> mutate
> check (g,m) = length g == length m && length (nub g \\ nub m) <= 1
>
> testMate = run (evalRandIO $ twoM randomGene >>= pairWithChild) >>=
> assert . check
> where pairWithChild (mom,dad) = (mom,dad,) <$> mate mom dad
> check (m,d,c) = length c == 13 &&
> (and . map (\(_,y,z) -> y == z) .
> dropWhile (\(x,y,_) -> x == y) $ zip3 m c d)
>
> unitTests = test [ "fitness1" ~: 0 ~=? fitness "Hello, world!"
> , "fitness2" ~: 399 ~=? fitness "H5p&J;!l<X\\7l"
> , "fitness3" ~: 297 ~=? fitness "Vc;fx#QRP8V\\$"
> , "fitness4" ~: 415 ~=? fitness "t\\O`E_Jx$n=NF" ]
>
> runTests = do mapM_ (quickCheck . monadicIO) [testGen, testMut, testMate]
> runTestTT unitTests

 
You can compile this post with GHC, or just run it in GHCi to poke around and ensure that the tests all pass.

Categories: Programming Tags:

Modern OpenGL with Haskell

March 29th, 2011 2 comments

This is a Haskell implementation of the ideas presented in chapter two of Joe Groff’s excellent tutorial on modern OpenGL.

This post is a complete program that relies on the OpenGL and GLUT Haskell packages. It also makes use of some data files:

You may copy and paste this post into a .lhs file, or download it.

We begin by importing the necessary libraries.

> import Graphics.Rendering.OpenGL
> import Graphics.UI.GLUT
> import Foreign.Storable (sizeOf)
> import Control.Concurrent (threadDelay)
> import Control.Applicative

 
We use a very small library for loading TGA images

> import TGA

 
… and a handy utility library for loading data into OpenGL.

> import Graphics.GLUtil

 
Optimism dictates that any exit is a successful exit.

> import System.Exit (exitWith, ExitCode(ExitSuccess))

 
Application state is shared between the rendering and animation functions with an IORef.

> import Data.IORef (IORef, newIORef, readIORef, modifyIORef)

 
We begin our program by defining the data structures used to carry program state between frames.

Shader state is a record of compiled shader programs, the uniform parameters to the shader, and an attribute accessed by the shader.

> data Shaders = Shaders { vertexShader   :: VertexShader
> , fragmentShader :: FragmentShader
> , program :: Program
> , fadeFactorU :: UniformLocation
> , texturesU :: [UniformLocation]
> , positionA :: AttribLocation }

 
Application state is carried in a record. State, in this case, is made up of some vertex data, some primitive data (e.g. polygons), two textures, shader state, and a scalar we use to fade between the two textures.

> data Resources = Resources { vertexBuffer  :: BufferObject
> , elementBuffer :: BufferObject
> , textures :: [TextureObject]
> , shaders :: Shaders
> , fadeFactor :: GLfloat }

 
The data that we actually want to render starts life as a list of 2D vertices,

> vertexBufferData :: [GLfloat]
> vertexBufferData = [-1, -1, 1, -1, -1, 1, 1, 1]

 
and a list of indices into that list,

> elementBufferData :: [GLuint]
> elementBufferData = [0..3]

 
Textures are prepared by loading them from disk, then setting various texture rendering modes.

> makeTexture :: FilePath -> IO TextureObject
> makeTexture filename =
> do (width,height,pixels) <- readTGA filename
> texture <- loadTexture $ texInfo width height TexBGR pixels

 
We set texturing parameters to linear filtering for minification and magnification, while disabling mip mapping. Texture wrapping is set to clamp both horizontally and vertically, S and T, respectively.

>        textureFilter   Texture2D   $= ((Linear', Nothing), Linear')
> textureWrapMode Texture2D S $= (Mirrored, ClampToEdge)
> textureWrapMode Texture2D T $= (Mirrored, ClampToEdge)
> return texture

 
Now we can load the data we want to render into OpenGL, and track it using our state record.

Shaders are prepared by loading and compiling the individual vertex and fragment shaders, then linking them into a program. We then query the program to get addresses for the uniform parameters and attribute that we will use to communicate data to the shader program.

> initShaders = do vs <- loadShader "hello-gl.vert"
> fs <- loadShader "hello-gl.frag"
> p <- linkShaderProgram [vs] [fs]
> Shaders vs fs p
> <$> get (uniformLocation p "fade_factor")
> <*> mapM (get . uniformLocation p)
> ["textures[0]", "textures[1]"]
> <*> get (attribLocation p "position")

 
Our global state record is prepared by creating the buffer objects for our vertex and index data, loading the image files to be used as textures, compiling the shader program, and initializing the fadeFactor field to zero.

> makeResources =  Resources
> <$> makeBuffer ArrayBuffer vertexBufferData
> <*> makeBuffer ElementArrayBuffer elementBufferData
> <*> mapM makeTexture ["hello1.tga", "hello2.tga"]
> <*> initShaders
> <*> pure 0.0

 
The interesting part of our program is the function that puts things on the screen.

One step in rendering is preparing the textures for our shaders. We do this by activating a texture unit, binding a texture object to the active texture unit, then setting the uniform sampler2D value in the fragment shader to refer to the correct texture unit.

> setupTexturing :: Resources -> IO ()
> setupTexturing r = let [t1, t2] = textures r
> [tu1, tu2] = texturesU (shaders r)
> in do activeTexture $= TextureUnit 0
> textureBinding Texture2D $= Just t1
> uniform tu1 $= Index1 (0::GLint)
> activeTexture $= TextureUnit 1
> textureBinding Texture2D $= Just t2
> uniform tu2 $= Index1 (1::GLint)

 
Geometry rendering begins by binding the buffer containing the vertex data and telling OpenGL how this data is formatted. In our case, each vertex has two floating point fields.

> setupGeometry :: Resources -> IO ()
> setupGeometry r = let posn = positionA (shaders r)
> stride = fromIntegral $ sizeOf (undefined::GLfloat) * 2
> vad = VertexArrayDescriptor 2 Float stride offset0
> in do bindBuffer ArrayBuffer $= Just (vertexBuffer r)
> vertexAttribPointer posn $= (ToFloat, vad)
> vertexAttribArray posn $= Enabled

 
Finally, drawing is effected by clearing the screen, setting the fadeFactor uniform parameter of our shader program, then drawing our textured geometry.

> draw :: IORef Resources -> IO ()
> draw r' = do clearColor $= Color4 1 1 1 1
> clear [ColorBuffer]
> r <- readIORef r'
> currentProgram $= Just (program (shaders r))
> uniform (fadeFactorU (shaders r)) $= Index1 (fadeFactor r)
> setupTexturing r
> setupGeometry r
> bindBuffer ElementArrayBuffer $= Just (elementBuffer r)
> drawElements TriangleStrip 4 UnsignedInt offset0
> swapBuffers

 
The only user interaction we support is exiting when the escape key is pressed.

> basicKMHandler :: Key -> KeyState -> Modifiers -> Position -> IO ()
> basicKMHandler (Char '\27') Down _ _ = exitWith ExitSuccess
> basicKMHandler _ _ _ _ = return ()

 
The animation callback limits itself to run at less than 100Hz, then sets the fade parameter carried in our application state based on elapsed time.

> animate :: IORef Resources -> IdleCallback
> animate r = do threadDelay 10000
> milliseconds <- fromIntegral <$> get elapsedTime
> let fade = sin (milliseconds * 0.001) * 0.5 + 0.5
> modifyIORef r (\x -> x { fadeFactor = fade })
> postRedisplay Nothing

 
Finally, kick GLUT off to open our window and start things going.

> main = do initialDisplayMode $= [DoubleBuffered]
> initialWindowSize $= Size 500 500
> (progname,_) <- getArgsAndInitialize
> createWindow "Chapter 2"
> r <- makeResources >>= newIORef
> displayCallback $= draw r
> idleCallback $= Just (animate r)
> keyboardMouseCallback $= Just basicKMHandler
> mainLoop
Categories: Programming Tags: