Haskell Genetic Algorithm "Hello, world!"

March 14, 2016: Reformatting

A post on reddit linked to several implementations of a cute "Hello, world!" program demonstrating a genetic algorithm that evolves towards a target string. Example programs were written in several languages, and I thought a Haskell version could be worthwhile as it demonstrates the use of random numbers, an issue that frustrates many newcomers to the language.

I use the MonadRandom package to deal with plumbing a random number generator through the code, but otherwise stick to lists and the usual cast of characters from the Prelude.

{-# LANGUAGE NamedFieldPuns, TupleSections #-}
import Control.Applicative
import Control.Arrow (second)
import Control.Monad (liftM, replicateM)
import Control.Monad.Random
import Data.Function (on)
import Data.List (minimumBy, sortBy, nub, (\))
import Data.Ord (comparing)
import Text.Printf (printf)

The example programs associated with the original post all come with tests. Some of these are standard unit tests that verify hand-picked examples, while others verify properties of random processes. To support both kinds of tests, we use HUnit for unit tests, and QuickCheck for property testing.

import Test.QuickCheck
import Test.QuickCheck.Monadic
import Test.HUnit hiding (assert)

For clarity, we create a type alias for the raw data we will be working with.

type Gene = String

target :: Gene
target = "Hello, world!"

Next are the functions for working with individual Genes.

mate :: RandomGen g => Gene -> Gene -> Rand g Gene
mate g1 g2 = (++) <$> flip take g1 <*> flip drop g2 <$> pivot
  where pivot = getRandomR (0, length g1 - 1)

mutate :: RandomGen g => Gene -> Rand g Gene
mutate g = (uncurry (++) .) . second . (:) <$> delta <*> parts
  where delta = getRandomR (' ', 'z')
        idx = getRandomR (0, length g - 1)
        parts = second tail . flip splitAt g <$> idx

fitness :: Gene -> Int
fitness = sum . map abs . zipWith ((-) `on` fromEnum) target

We also want a random Gene generator.

randomGene :: RandomGen g => Rand g Gene
randomGene = replicateM (length target) $ getRandomR (' ', 'z')

Finally we can start thinking about populations. The example Python program I looked at had several parameters for initializing the population and controlling how it evolves. We capture those parameters in a record.

data PopInfo = PopInfo { size      :: Int
                       , crossover :: Float
                       , elitism   :: Float
                       , mutation  :: Float }

A Population is a pair of a record describing the population and a collection of Genes.

type Population = (PopInfo, [Gene])

defaultPop :: PopInfo
defaultPop = PopInfo 1024 0.8 0.1 0.03

We will use a helper function to produce a randomized initial population.

randomPop :: RandomGen g => PopInfo -> Rand g Population
randomPop = liftA2 (,) <$> pure <*> flip replicateM randomGene . size

A tournament selection method is used to select parent Genes from a Population.

tournamentSize :: Int
tournamentSize = 3

tournamentSelection :: RandomGen g => Population -> Rand g Gene
tournamentSelection (info, genes) =  
  minimumBy (comparing fitness) .  map (genes !!) <$>
  replicateM tournamentSize (getRandomR (0, size info - 1))

twoM :: Monad m => m a -> m (a, a)
twoM = liftM (\[x,y] -> (x,y)) . replicateM 2

selectParents :: RandomGen g => Population -> Rand g (Gene, Gene)
selectParents = twoM . tournamentSelection

The meat of the algorithm is the evolve function that, appropriately enough, evolves a Population.

evolve :: RandomGen g => Population -> Rand g Population
evolve p@(info@(PopInfo {size, crossover, elitism, mutation}), genes) = 
  (info,) . sortBy (comparing fitness) . (take idx genes ++) <$>
  replicateM (size - idx) (twoM getRandom >>= go)
  where idx = round (fromIntegral size * elitism)
        go (r1,r2) | r1 <= crossover = 
                     selectParents p >>= uncurry mate >>= addChild r2
                   | otherwise = addMutation r2
        addChild r c
          | r <= mutation = mutate c
          | otherwise = return c
        addMutation r
          | r <= mutation = mutate . (genes !!) =<< getRandomR (idx, size - 1)
          | otherwise = (genes !!) <$> getRandomR (idx, size - 1)

Now we're ready to kick off a multi-generation quest for the string we already wrote down up top.

iterateUntil :: Monad m => (a -> Bool) -> (a -> m a) -> a -> m a
iterateUntil stop f = go
  where go x | stop x = return x
             | otherwise = f x >>= go

maxGenerations :: Int
maxGenerations = 16384

main = evalRandIO (randomPop defaultPop >>= iterateUntil done step . (, 0))
       >>= result
  where step (p,gen) = (,) <$> evolve p <*> pure (gen+1)
        done ((_, g:_), generation) = 
          generation == maxGenerations || fitness g == 0
        result ((_, g:_), generation)
          | generation == maxGenerations = 
            putStrLn "Maximum generations reached without success."
          | fitness g == 0 = printf "Reached target (%d): %s\n" generation g
          | otherwise = putStrLn "Evolution is hard. Let's go shopping."

In keeping with the spirit of the original programs, we include tests to exercise some of the component functionality we've put together.

testGen = run (evalRandIO randomGene) >>= assert . check
  where check g = and $ map ($ g) [ (>= 0) . fitness
                                  , (== 13) . length
                                  , all (between 32 122 . fromEnum) ]
        between l r x = l <= x && x <= r

testMut = run (evalRandIO $ randomGene >>= pairWithMutant) >>= assert . check
  where pairWithMutant = liftA2 (,) <$> pure <*> mutate
        check (g,m) = length g == length m && length (nub g \ nub m) <= 1

testMate = run (evalRandIO $ twoM randomGene >>= pairWithChild) >>= 
           assert . check
  where pairWithChild (mom,dad) = (mom,dad,) <$> mate mom dad
        check (m,d,c) = length c == 13 && 
                        (and . map (\(_,y,z) -> y == z) . 
                         dropWhile (\(x,y,_) -> x == y) $ zip3 m c d)

unitTests = test [ "fitness1" ~: 0 ~=? fitness "Hello, world!" 
                 , "fitness2" ~: 399 ~=? fitness "H5p&J;!l<X\7l"
                 , "fitness3" ~: 297 ~=? fitness "Vc;fx#QRP8V\$"
                 , "fitness4" ~: 415 ~=? fitness "t\O`E_Jx$n=NF" ]

runTests = do mapM_ (quickCheck . monadicIO) [testGen, testMut, testMate]
              runTestTT unitTests

You can compile this post with GHC, or just run it in GHCi to poke around and ensure that the tests all pass.