Skip to content

Introduction to Macros

Joshua Grosso edited this page Sep 28, 2019 · 23 revisions

Metaprogramming can be very useful in reducing or removing boilerplate code. However, some languages have less-than-satisfactory solutions while others have none at all. Template Haskell, for example, has several major problems (e.g. it is notoriously unstable between GHC releases, according to the 2017 State of Haskell).

However, by following in Lisp's footsteps and adopting a homoiconic syntax (yay, parentheses!), Axel is equipped with a simple yet powerful metaprogramming system. Unlike in most languages, Axel source code is effectively a textual representation of the corresponding abstract syntax tree (AST) that the transpiler uses to represent the program internally. Macros, then, are just functions that take lists of Axel expressions and return lists of other Axel expressions. The Axel transpiler runs these macros at compile-time, allowing for code that creates code.

For an example of the power that macros provide us with, let's try to implement the LambdaCase GHC language extension as a macro ourselves.

(Before starting this tutorial, it's worth checking out the Syntax Reference to become familiar with Axel's syntax.)

Installing Axel

First, we should install Axel. If you're following along from home and want to run the macro we're developing, run axel file run <file path>.

Quoting

First, for a brief introduction to Axel's metaprogramming abilities. Let's use the following Axel expression as our example:

(+ 1 2)

This applies the + function to two arguments, 1 and 2. Internally, Axel represents this not as a function application per se, but rather as an s-expression (the Lisp name for a parenthesized list) containing the symbols +, 1, and 2. The cool part about this, which Axel inherits from Lisp, is that by passing the expression to the quote "special form," we can get that internal representation and manipulate it. For example:

(quote (+ 1 2)) -- => (AST.SExpression <metadata> [(AST.Symbol <metadata> "+") (AST.LiteralInt <metadata> 1) (AST.LiteralInt <metadata> 2)])

The metadata included in the AST structure assists the Axel compiler with telling users where errors originally happened.

You can quote any expression you'd like:

(quote 1) -- => (AST.LiteralInt <metadata> 1)
(quote "test") -- => (AST.LiteralString <metadata>"test")

You can even quote another application of quote:

(quote (quote 1)) -- (AST.SExpression <metadata> [(AST.Symbol <metadata> "quote") (AST.LiteralInt <metadata> 1)])

Once we have the quoted form of an expression, we can manipulate it like any other data structure, like so:

(case (quote (+ 1 2))
  ((AST.SExpression metadata sexp) (AST.SExpression metadata (reverse sexp))))
-- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 2) (AST.LiteralInt <metadata> 1) (AST.Symbol <metadata> "+")])

Note that we preserved the metadata when constructing our new AST.SExpression. It is very important to do this wherever possible – otherwise, users can receive error messages that can't be mapped to a location in the original Axel file (which is very bad for usability)!

(However, sometimes there really is no metadata to use for an expression. For example, maybe you generated it as part of the macro rather than receiving it as an argument. In this case, you can use Nothing where you would usually pass through metadata.)

Quoting actually turns out to be pretty common when making a Lisp program, so Axel supports some special syntax for it. The apostrophe character (') before an Axel expression will cause it to be quoted automatically.

'(1 2 3)
-- is equivalent to
(quote (1 2 3))

Macros

Now, let's get some practice with manipulating Axel expressions.

Let's define a function, backwards, to reverse whatever Axel expression we pass in. We use def to define a function in Axel. For example:

(def backwards (() (-> AST.Expression AST.Expression))) -- It will take an Axel expression and return an Axel expression

Now, an Axel Expression can be one of several cases: LiteralChar <metadata> Char, LiteralFloat <metadata> Float, LiteralInt <metadata> Int, LiteralString <metadata> String, SExpression <metadata> [Expression], or Symbol <metadata> String. The only thing we can reverse here is an SExpression, so let's return the input unchanged for all other cases.

 ((AST.LiteralChar _ _) expression)
 ((AST.LiteralInt _ _) expression)
 ((AST.LiteralString _ _) expression)
 ((AST.Symbol _ _) expression)

For the case that we're passed an SExpression, however, let's extract the list of Expressions inside it and reverse it. We'll have to wrap the new list in an SExpression, since our function expects an Expression to be returned.

 ((AST.SExpression metadata expressions) (AST.SExpression metadata (reverse expressions))))))

Now, we can run our backwards function. We have to quote (+ 2 1) when we pass it in, because otherwise it would be evaluated and backwards would be passed 3 (which is not what we want). Rather, we want the internal representation of (+ 2 1) to be passed to backwards, which requires a value of type Expression.

(backwards '(+ 2 1)) -- => (AST.SExpression <metadata> [(AST.Symbol <metadata> "+") (AST.LiteralInt <metadata> 1) (AST.LiteralInt <metadata> 2)])

When we evaluate backwards, we get back a new Expression. So, we now have the data representing our new code, but how can we actually execute it to get 3?

Macros to the rescue! Currently, backwards is just a plain old Axel function. But, Axel provides an alternative to normal function definition (def) called defmacro, and it lets us define a function that will actually be run at compile-time. Macros take a list of Expressions and return a list of Expressions. Macro calls in an Axel program are replaced by the results they return.

To see this in action, let's turn backwards into a macro. defmacro doesn't require a type signature to be passed, since it automatically uses the type [Expression] -> IO [Expression]. You'll notice that backwards is a partial function, since it wouldn't be able to handle being passed more than one argument. However, this isn't the worry it would usually be if this were a normal function, since errors will be thrown at compile-time and fixed before the program is actually run.

(defmacro backwards
  ((expression)

Because backwards will not have any side-effects, we will return whatever our result will be (since we need to return a value of type IO [Expression] rather than just [Expression]).

   (return

Also, we'll wrap the whole thing in a list, because (again) we're looking to return a list of Expressions rather than just a single Expression. From here, our definition is the same as before:

    [(case expression
      ((AST.LiteralChar _ _) expression)
      ((AST.LiteralInt _ _) expression)
      ((AST.LiteralString _ _) expression)
      ((AST.SExpression metadata expressions) (AST.SExpression metadata (reverse expressions)))
      ((AST.Symbol _ _) expression))])))

Now, if we actually call backwards in our code, the macro call will be "expanded" during compilation. Also, if you remember from when backwards was a function, we had to explicitly quote our input for it not to be evaluated. However, when passing a form to a macro, it will be automatically quoted. This helps macros remain unobtrusive and makes them feel more like first-class syntactic constructs.

To see how our macro will actually be evaluated, let's walk through it step-by-step. If the program we provide to Axel looks like this:

(= main ()
  (backwards ("Hello, world!" putStrLn)))

backwards will be executed as the program compiles, after which the program will look like this:

(= main ()
  () (putStrLn "Hello, world!"))

And, when the program is executed at the end of this process, the output will be, as expected:

"Hello, world!"

Congratulations, you've created your very first macro!

Quasiquoting

Now, let's try to create a more complicated macro and implement GHC's LambdaCase extension from scratch. We're going to write our macro using what we know now, and then we'll take a look at "quasiquoting" and see how it helps us make our macros more concise. Finally, we'll see it in action by using it to clean up our example.

For some context, take the following Haskell code:

\x -> case x of
        True -> 1
        False -> 2

Notice that x is immediately passed to the case expression and never used again. This happens commonly enough that the LambdaCase extension was designed to help remove the duplication. It allows us to write the above lambda more concisely, like so:

\case
  True -> 1
  False -> 2

So, it's a pretty simple (yet useful) syntactic transformation. If we want Axel to have the same feature, it could look like the following:

-- Before
(\ (x) (case x
         ((True) 1)
         ((False) 2)))

-- After
(\case
  ((True) 1)
  ((False) 2))

The rule to implement this feature is pretty simple: If you see (\case <case 1, case 2, ...>), replace that with (\ (<identifier>) (case <identifier> <case 1, case 2, ...>)), where <identifier> is some as-of-yet unused identifier.

Let's start our macro definition like usual:

(defmacro \case
  (cases undefined))

Now, we'll fill in the basic details, based on the discussion above. Since we want to return a list, we'll use return to lift our result into IO and then wrap our result with the SExpression constructor.

(defmacro \case
  (cases (return [(AST.SExpression Nothing undefined)])))

We want to fill in undefined with an Axel form that looks like \ (<some identifier>) (case <some identifier> <cases>). Since we want to use \ and case as symbols (instead of using them to refer to what they mean at runtime), we'll quote them with the ' operator.

(defmacro \case
  (cases (return [(AST.SExpression Nothing ['\
                                            (AST.SExpression Nothing undefined)
                                            (AST.SExpression Nothing ['case undefined undefined])])]))

Now, we need to somehow get an unused identifier that we can use as the variable in our lambda. Luckily, Axel provides the gensym helper function, which is designed just for this! Because gensym runs in the IO monad (gensym is of type IO Expression because the Symbol it creates must be unique), our code will as well. gensym is automatically imported for us by Axel.

Here, we see the do macro in action, which replicates the main use-cases of Haskell's do-notation.

(defmacro \case
  (cases
    (do
     (<- identifier gensym)
     (return [(AST.SExpression Nothing ['\
                                        (AST.SExpression Nothing [identifier])
                                        (AST.SExpression Nothing ['case identifier undefined])])])))

And now, for the final undefined: the cases themselves. The cases parameter of our macro is a list of Expressions that we want to splice directly into our output (that is, we're looking for (case <identifier> <case 1> <case 2> <...>) rather than (case <identifier> (<case 1> <case 2> <...>))). So, we can finish our macro's definition like so:

(defmacro \case
  (cases
    (do
     (<- identifier gensym)
     (return [(AST.SExpression Nothing ['\
                                        (AST.SExpression Nothing [identifier])
                                        (AST.SExpression Nothing (: 'case (: identifier cases)))])]))))

And, there we have it!

To be honest, that was a bit tedious to write, and it's not the easiest to understand. Manually constructing our SExpressions adds a lot of extra noise to our macro definition. Fortunately, though, there's a better way!

Meet quasiquote, quote's big sibling. It behaves just like quote on normal expressions:

(quasiquote 1) -- => (AST.LiteralInt <metadata> 1)
(quasiquote (1 2 3)) -- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 1) (AST.LiteralInt <metadatA> 2) (AST.LiteralInt <metadata> 3)])

However, quasiquote has some sidekicks that give it superpowers. If you wrap quasiquote's argument with unquote, it will undo the quasiquotation for just that section. For example:

(quasiquote 1) -- => (AST.LiteralInt <metadata> 1)
(quasiquote (unquote 1)) -- => 1

(quasiquote (1 (AST.LiteralInt Nothing 2) 3)) -- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 1) (AST.SExpression <metadata> [(AST.Symbol <metadata> "AST.LiteralInt") (AST.LiteralInt Nothing 2)]) (AST.LiteralInt <metadata> 3)])
(quasiquote (1 (unquote (AST.LiteralInt Nothing 2)) 3)) -- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 1) (AST.LiteralInt Nothing 2) (AST.LiteralInt <metadata> 3)])

There's also unquoteSplicing, which not only undoes quasiquotation but also "splices" the result directly into its enclosing s-expression. For example:

(quasiquote (1 (unquote (AST.SExpression Nothing [(AST.LiteralInt Nothing 2) (AST.LiteralInt Nothing 3)])))) -- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 1) (AST.SExpression Nothing [(AST.LiteralInt Nothing 2) (AST.LiteralInt Nothing 3)])])
(quasiquote (1 (unquoteSplicing (AST.SExpression Nothing [(AST.LiteralInt Nothing 2) (AST.LiteralInt Nothing 3)]))) -- => (AST.SExpression <metadata> [(AST.LiteralInt <metadata> 1) (AST.LiteralInt Nothing 2) (AST.LiteralInt Nothing 3)])

In effect, quasiquote acts as a sort of templating system for creating Axel expressions, where you selectively choose what parts should be evaluated at compile time.

One last note: quasiquoting, like quoting, is common enough that it gets special syntax. In the Axel compiler, just like how ' automatically becomes quote, quasiquote becomes \`` (e.g. (quasiquote (1 2 3))is usually written`(1 2 3)instead).unquotecan be written with a tilde (e.g.(unquote 1)is usually written1), and unquoteSplicingcan be written with a tilde and then an at-sign (e.g.(unquoteSplicing (1 2 3))is usually written@(1 2 3)`).

Now, we have all the building blocks we need to clean up our macro definition from earlier. Here's the original definition, for reference:

(defmacro \case
  (cases
    (do
     (<- identifier gensym)
     (return [(AST.SExpression Nothing ['\
                                        (AST.SExpression Nothing [identifier])
                                        (AST.SExpression Nothing (: 'case (: identifier cases)))])]))))

And here's a version which uses quasiquote, unquote, and unquoteSplicing:

(defmacro \case
    (cases
     (do (<- identifier gensym)
     (return [`(\ (~identifier) (case ~identifier ~@cases))])))

Notice how the second representation closely mirrors the final output (much more so than our first try does). Also, try to figure out how we derived the second representation from the first one. While it might be difficult at first, getting the hang of quoting and quasiquoting just takes practice.

And, that's it for now! Welcome to Axel!