Preprocessing
What's the Least I Can Do?

2017-12-05

Running source files through a C preprocessor (CPP) is a common step for many languages. In part it leverages common familiarity with C, in part it can ease eventual interoperation with C, and – the part that I think is interesting – it occasionally feels like a reasonable trade-off between power and complexity. So what use is hpp, a Haskell preprocessor?

Macro Reluctant

In Lisp and Scheme dialects, macros are a central pillar. Without spending too much time trying to figure out just how popular they are, or if they are popular to make up for deficiencies in their host languages, it seems apparent that they owe some of their success to the relative ease with which one traverses the syntax of these languages. For the sake of boiler plate reduction or complex compile-time computation, available facilities for operating on syntax trees may be central. But for other scenarios – such as choosing between multiple implementations – a critical design consideration is determining how the macro is ultimately parameterized. In other words, what is the mechanism that sets the value we are checking in our macro?

If we are going to invoke our compiler from a command line, or use a build script written in something other than the language in which we wrote our program (e.g. a Makefile), the simpler the coupling, the better. Now, aiming for simplicity often turns into aiming at one's foot when the complexity of the actual needs greatly exceeds that of the anticipated needs, but (today) I think sometimes needs really are simple. In these cases, passing your compiler -D_WIN32, say, when targeting Windows does a credible job at minimizing the interface between however you invoke the compiler (your shell) and the compiler itself.

Something more structured would require that your shell be able to serialize values your compiler could realize as values that a sophisticated program could scrutinize. With CPP, we just say that we're working with textual strings which can sometimes be parsed into numeric values. It's awful, but seems to suffice often enough.

Is CPP Right For My Pet Language?

In the long run, you may want to develop an elegant phased compilation story for your language. For today, you may want to open a line of communication between your compiler and the user running the compiler (as opposed to the programmer writing a program in your language). To this end, you might find room in your syntax to allow CPP-style directives (first non-space character on a line is a #), in which case you can utilize hpp as done in a few basic tests included in the package:

{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Except
import Data.ByteString.Char8 (ByteString)
import Data.Maybe (fromMaybe)
import Hpp

sourceIfdef :: [ByteString]
sourceIfdef = [ "#ifdef FOO"
              , "x = 42"
              , "#else"
              , "x = 99"
              , "#endif" ]

hppHelper :: HppState -> [ByteString] -> [ByteString] -> IO Bool
hppHelper st src expected =
  case runExcept (expand st (preprocess src)) of
    Left e -> putStrLn ("Error running hpp: " ++ show e) >> return False
    Right (res, _) -> if hppOutput res == expected
                      then return True
                      else do putStr ("Expected "++show expected++", got")
                              print (hppOutput res)
                              return False

testIf :: IO Bool
testIf = hppHelper (fromMaybe (error "Preprocessor definition did not parse")
                              (addDefinition "FOO" "1" emptyHppState))
                   sourceIfdef
                   ["x = 42\n","\n"]

In testIf, we set FOO equal to 1, so that preprocessing the sourceIfdef input reduces to the x = 42 branch. The sample source code is not any particular language, it could be yours! This little example does not demonstrate #include'ing other files, which would use the runHpp function that does some actual IO to access the file system.

Don't I Already Have a CPP?

You already have at least one CPP-capable program: gcc or clang. However, they sometimes change CPP functionality to better achieve their primary mission in life of compiling code, even if it breaks your tangential use of that phase of their pipeline. Use of CPP for simple code elision purposes is not compiling C++ levels of complexity, so pulling in a dependency that complicated (you don't want to tell your users to not update their system-wide clang installation because the clang developers changed something in CPP that causes you grief) is a disappointingly large concession to not reinventing wheels. Luckily, the wheel has already been reinvented many times over, and the hpp take on that story offers an executable and a Haskell library that I hope is simple enough to use without spending undue time thinking about C and C++ compilers.