Skip to content

Latest commit

 

History

History
129 lines (95 loc) · 5.33 KB

2017-08-22-sum-types.md

File metadata and controls

129 lines (95 loc) · 5.33 KB

Sum Types

8/22/17

Today I worked on implementing sum types in the language and this required a bit more learning and experimenting on my part. Initially I had some confusion over sum and product types, but a little guidance from @robrix straightened me out. The two concepts are really mirror images of each other. The pairs I implemented yesterday use , to build up a new pair and use fst or snd to tear down a pair. Pairs are essentially product types. Sums on the other hand are constructed or built up with left and right and torn down with case (in this toy language at least). You can see that they reflect each other.

A small detour for the mathematical notion behind sums and products. In Haskell, sum types look like this (basically Either):

data Sum a b = L a | R b

which means the space of available types increases arithmetically:

data Zero
data One = One
type Two = Sum One One
type Three = Sum One (Sum One One)

Product types on the other hand look like this:

data Product a b = P a b

which means the space of available types multiplies:

type TwoByTwo = Product Two Two
-- Possible types look like this:
-- P (L One) (L One)
-- P (L One) (R One)
-- P (R One) (R One)
-- P (R One) (L One)

You can of course define Zero:

data Zero

And wonderfully, functions are exponential types:

type Exponential a b = a -> b

Moving on.....

So for this toy language I set out to create a few new constructs to model sums, namely:

  • left and right constructors
  • case .. of statements to pattern match on the two branches.

The tricky part for me was modeling the various places where the types need to line up in the type checker. I ended up with case holding an expression and two lambdas (left and right) and that the constructors left/right would hold a lambda and their return type. Here's what that looks like with the rest of the language expressions omitted for clarity:

data ExprF a
  = InL a Type
  | InR a Type
  | Case { expression :: a, left :: a, right :: a}
  ...

data Type
  = TInt
  | TBool
  | TSum Type Type
  ...

(My first attempt put the Type on the Case as instead, but then I didn't have enough information when type checking.)

When it comes to checking the types for a case statement, there are a bunch of things you have to validate:

  • The case expression has to return a sum type.
  • Both the left and right branches of the case have to return the same type.
  • The left branch has to accept an argument of the same type as the left of the case expression's sum type.
  • The right branch has to accept an argument of the same type as the right of the case expression's sum type.

My language also requires that both left and right branches are lambdas and that left and right be constructed with the full sum type annotation. For example, even if you want to have just left 1, you have to fully specify the overall sum type b/c that type is then used to check consistency. So instead, you need to write left 1 : (Int + Bool) where Bool is the type of the right part of the sum (could be anything as long as it's consistent).

Once done with the type checking, evaluation looks a lot like function application, but you need some new values to keep track of whether the left or the right branch needs to be evaluated. I got a little turned around thinking through the evaluation here, but it helps to write how you'd do it in Haskell with Either and functions.

case' :: Either l r -> (l -> a) -> (r -> a) -> a
case' e fL fR = case e of
  Left l -> fL l
  Right r -> fR r

The biggest difference in writing the evaluation for case is that you have to eval the left or right branch and then apply the arg by evaluating the closure body in the new environment. That looks a bit like this (rest of eval omitted, see src/Interpreter.hs)

...
InL a _ -> VLeft (eval' env a)
InR a _ -> VRight (eval' env a)
Case e ifL ifR -> case eval' env e of
  VLeft arg  -> let Closure n body env' = eval' env ifL
                in eval' ((n, arg) : env') body
  VRight arg -> let Closure n body env' = eval' env ifR
                in eval' ((n, arg) : env') body
...

And that's it. Here's some sum types in action:

λ eval <$> check (parseExpr "(\\x : (Int + Bool) . case x of (\\x : Int . true) (\\x : Bool . false)) (left 1 : (Int + Bool))")
Right (Literal (LBool True))
λ eval <$> check (parseExpr "(\\x : (Int + Bool) . case x of (\\x : Int . true) (\\x : Bool . false)) (right true : (Int + Bool))")
Right (Literal (LBool False))

# We expect this to fail type checking.
λ eval <$> check (parseExpr "(\\x : (Int + Bool) . case x of (\\x : Int . true) (\\x : Bool . false)) 1")
Left "expected type TSum TInt TBool but got TInt"

# We expect this to fail type checking because x isn't a sum type.
λ eval <$> check (parseExpr "(\\x : Int . case x of (\\x : Int . true) (\\x : Bool . false)) 1")
Left "don't know how to handle case with non-sum types"

# We expect this to fail type checking because the left branch takes a pair (Int, Int), but the case is for the sum type (Int + Bool).
λ eval <$> check (parseExpr "(\\x : (Int + Bool) . case x of (\\x : (Int, Int) . true) (\\x : Bool . false)) (right true : (Int + Bool))")
Left "nope TSum TInt TBool. left is TPair TInt TInt -> TBool. right is TBool -> TBool."

Tada!