Haskell Genetic Algorithm "Hello, world!"
March 14, 2016: Reformatting
A post on reddit linked to several implementations of a cute "Hello, world!" program demonstrating a genetic algorithm that evolves towards a target string. Example programs were written in several languages, and I thought a Haskell version could be worthwhile as it demonstrates the use of random numbers, an issue that frustrates many newcomers to the language.
I use the MonadRandom package to deal with plumbing a random number generator through the code, but otherwise stick to lists and the usual cast of characters from the Prelude.
{-# LANGUAGE NamedFieldPuns, TupleSections #-} import Control.Applicative import Control.Arrow (second) import Control.Monad (liftM, replicateM) import Control.Monad.Random import Data.Function (on) import Data.List (minimumBy, sortBy, nub, (\)) import Data.Ord (comparing) import Text.Printf (printf)
The example programs associated with the original post all come with tests. Some of these are standard unit tests that verify hand-picked examples, while others verify properties of random processes. To support both kinds of tests, we use HUnit for unit tests, and QuickCheck for property testing.
import Test.QuickCheck import Test.QuickCheck.Monadic import Test.HUnit hiding (assert)
For clarity, we create a type alias for the raw data we will be working with.
type Gene = String target :: Gene target = "Hello, world!"
Next are the functions for working with individual Gene
s.
mate :: RandomGen g => Gene -> Gene -> Rand g Gene mate g1 g2 = (++) <$> flip take g1 <*> flip drop g2 <$> pivot where pivot = getRandomR (0, length g1 - 1) mutate :: RandomGen g => Gene -> Rand g Gene mutate g = (uncurry (++) .) . second . (:) <$> delta <*> parts where delta = getRandomR (' ', 'z') idx = getRandomR (0, length g - 1) parts = second tail . flip splitAt g <$> idx fitness :: Gene -> Int fitness = sum . map abs . zipWith ((-) `on` fromEnum) target
We also want a random Gene
generator.
randomGene :: RandomGen g => Rand g Gene randomGene = replicateM (length target) $ getRandomR (' ', 'z')
Finally we can start thinking about populations. The example Python program I looked at had several parameters for initializing the population and controlling how it evolves. We capture those parameters in a record.
data PopInfo = PopInfo { size :: Int , crossover :: Float , elitism :: Float , mutation :: Float }
A Population
is a pair of a record describing the
population and a collection of Gene
s.
type Population = (PopInfo, [Gene]) defaultPop :: PopInfo defaultPop = PopInfo 1024 0.8 0.1 0.03
We will use a helper function to produce a randomized initial population.
randomPop :: RandomGen g => PopInfo -> Rand g Population randomPop = liftA2 (,) <$> pure <*> flip replicateM randomGene . size
A tournament selection method is used to select parent Gene
s from a Population
.
tournamentSize :: Int tournamentSize = 3 tournamentSelection :: RandomGen g => Population -> Rand g Gene tournamentSelection (info, genes) = minimumBy (comparing fitness) . map (genes !!) <$> replicateM tournamentSize (getRandomR (0, size info - 1)) twoM :: Monad m => m a -> m (a, a) twoM = liftM (\[x,y] -> (x,y)) . replicateM 2 selectParents :: RandomGen g => Population -> Rand g (Gene, Gene) selectParents = twoM . tournamentSelection
The meat of the algorithm is the evolve
function that,
appropriately enough, evolves a Population
.
evolve :: RandomGen g => Population -> Rand g Population evolve p@(info@(PopInfo {size, crossover, elitism, mutation}), genes) = (info,) . sortBy (comparing fitness) . (take idx genes ++) <$> replicateM (size - idx) (twoM getRandom >>= go) where idx = round (fromIntegral size * elitism) go (r1,r2) | r1 <= crossover = selectParents p >>= uncurry mate >>= addChild r2 | otherwise = addMutation r2 addChild r c | r <= mutation = mutate c | otherwise = return c addMutation r | r <= mutation = mutate . (genes !!) =<< getRandomR (idx, size - 1) | otherwise = (genes !!) <$> getRandomR (idx, size - 1)
Now we're ready to kick off a multi-generation quest for the string we already wrote down up top.
iterateUntil :: Monad m => (a -> Bool) -> (a -> m a) -> a -> m a iterateUntil stop f = go where go x | stop x = return x | otherwise = f x >>= go maxGenerations :: Int maxGenerations = 16384 main = evalRandIO (randomPop defaultPop >>= iterateUntil done step . (, 0)) >>= result where step (p,gen) = (,) <$> evolve p <*> pure (gen+1) done ((_, g:_), generation) = generation == maxGenerations || fitness g == 0 result ((_, g:_), generation) | generation == maxGenerations = putStrLn "Maximum generations reached without success." | fitness g == 0 = printf "Reached target (%d): %s\n" generation g | otherwise = putStrLn "Evolution is hard. Let's go shopping."
In keeping with the spirit of the original programs, we include tests to exercise some of the component functionality we've put together.
testGen = run (evalRandIO randomGene) >>= assert . check where check g = and $ map ($ g) [ (>= 0) . fitness , (== 13) . length , all (between 32 122 . fromEnum) ] between l r x = l <= x && x <= r testMut = run (evalRandIO $ randomGene >>= pairWithMutant) >>= assert . check where pairWithMutant = liftA2 (,) <$> pure <*> mutate check (g,m) = length g == length m && length (nub g \ nub m) <= 1 testMate = run (evalRandIO $ twoM randomGene >>= pairWithChild) >>= assert . check where pairWithChild (mom,dad) = (mom,dad,) <$> mate mom dad check (m,d,c) = length c == 13 && (and . map (\(_,y,z) -> y == z) . dropWhile (\(x,y,_) -> x == y) $ zip3 m c d) unitTests = test [ "fitness1" ~: 0 ~=? fitness "Hello, world!" , "fitness2" ~: 399 ~=? fitness "H5p&J;!l<X\7l" , "fitness3" ~: 297 ~=? fitness "Vc;fx#QRP8V\$" , "fitness4" ~: 415 ~=? fitness "t\O`E_Jx$n=NF" ] runTests = do mapM_ (quickCheck . monadicIO) [testGen, testMut, testMate] runTestTT unitTests
You can compile this post with GHC, or just run it in GHCi to poke around and ensure that the tests all pass.