Row Polymorphism

Irken supports record type compatibility based on Row Polymorphism, a form of compile-time Structural Typing which solves the loss of information problem.

Background

The most common form of static typing is Nominal Typing. This is the type found in Java, C#, and C++. In a nominal type system, a type Foo is only type compatible with type Bar if an explicit declaration makes them compatible during class definition, such as via Java's keywords implements and extends.

Structural Typing means a record is type-compatible with a record type if it contains a matching set of fields, no explicit declaration is required. C++ templates, Google Go interfaces, and Irken records offer static Structural Typing, though they use different methods to achieve it.

Row Polymorphism is the method Irken uses to implement Structural Typing. In Row Polymorphism, records with a superset of fields are made type-compatible via a variable which holds the extra fields, possibly passing them through as a return result. This is an alternative to Structural Subtyping, which constrains a superset type down to the subset of fields, at which point the extra fields are inaccessible. We'll discuss this more when we talk about the loss of information problem below.

Records in Irken

Consider the following example. Types are supplied in comments for clarity. They may be supplied explicitly, but the compiler does not require them.

(include "lib/basis.scm")

(define (do_action a) (printf a "\n"))

(define (fa x) (do_action x.a))    
(define (fb x) (do_action x.b))    
(define (fc x) (do_action x.c))    
(define (fea x) (do_action x.e.a)) 

(define data { a="1" 
               b="2" 
               e=  { a="5" 
                     b="6" }
             })

(fa data)     (fb data)
(fa data.e)   (fb data.e)
(fea data)
; (fc data) ;; type error
; (fea data.e)  ;; type error
#u

While our data creation resembles a hashtable initialization; and our code mentions no types, like you might find in a dynamic language like Python, Ruby, or Clojure; our compiler is able to produce compile-time type errors when we access fields that are not present in our record type. Further, we're able to pass the root data and the sub-record data.e into both fa and fb, because they both contain .a and .b fields.

What is going on here? First our record is not a hashable. It's more like a C-struct. Each record is statically defined packed data which directly contains immediate values and pointers to boxed types such as other records. Type inference allows us to omit our types, but their compiler infers types and type parametric for each function. For example:

(define (fa x) (do_action x.a))     ;; type of 'fa' is: ({ a=string ...} -> #u )

This means the function will accept any record type which contains an a string field, and it returns nothing (#u means undefined). The ... indicates an open record type, which is why records may have additional fields.

Pattern Matching on Records

Let's look at an example of pattern matching on records. When a literal appears in the pattern match clause, such as the 2 in b=2, it becomes a match constraint which must be satisfied for that clause to match. When a symbol name appears in a pattern match clause, such as the x in a=x it becomes a destructuring variable, and x will contain the value of the a field in that match expression. (Note that it's currently only possible to match against literals. In order to pattern match against variables or expressions, one needs match guards which Irken does not have. To do this you can use if and cond constructs instead.

(include "lib/core.scm")

(define thing
  {a=x b=2} -> x
  {a=3 b=y} -> y
  {a=m b=n} -> (+ m n)
  )

(printn (thing {a=3 b=1}))
(printn (thing {a=3 b=2}))
(printn (thing {a=4 b=5}))

What Row Polymorphism also allows us to do, is define functions which produce the original types that were supplied, rather than a stripped down type. Here the type {a=string b=string c=string} flows through function calls which each operate on a simpler type.

(include "lib/basis.scm")

(define (fa x) (printf x.a) x) ;; ({a=string ...} -> {a=string ...})
(define (fb x) (printf x.b) x) ;; ({b=string ...} -> {b=string ...})
(define (fc x) (printf x.c) x) ;; ({c=string ...} -> {c=string ...})

(fc (fb (fa {a="1" b="2" c="3"})))

What is the Loss-of-Information-Problem?

The loss-of-information problem is something that happens when operations on Structural types constrain them to a subset of their original form, losing any extra information which was present in their original form. Consider the following function, which is written to emulate what happens in Structural Subtyping. The type specification is optional and provided for clarity.

(define (rename pet new_name) : ({name=string ...} -> {name=string})
   { name=new_name })
(rename {name="Rover" age=20} "Yeller") ;; -> {name="Yeller"}

This is written in functional style, where we don't modify the original record but instead produce a new one, however the returned record only contains name; we threw-away the age information because we didn't know about it.

Row Polymorphism offers a solution to this problem by allowing us to propagate the polymorphic type of the original record through to our return result. In Irken, these extra fields are denoted using ellipsis (...). If we have multiple record arguments, we can use ... on all of them, Irken infers what extra record fields belong in the return type. Let's look at the proper way to do this in Irken.

(define (rename pet new_name) : ({name=string ...} -> {name=string ...})
   ;; Sam needs to implement %rcopy to implement this example
  )

Loss of information over sets

A related loss of information problem occurs in functional transformation interfaces, such as in the following C# interface:

interface Functor<A> {
   Functor<B> fmap<B>( Func<A,B> f);
}

As you can see, no matter what type of object you call fmap() on, you get back a Functor<B>, which means you lost access to the original objects real type and capabilities. To preserve the original object's type in C# substantially complicates the code (see A Look at Functor). Haskell solves this loss-of-information problem with Type-Classes. When you fmap() on a list in Haskell, you get back a list; when you fmap() on a tuple you get back a Tuple.

TODO:

show example Functor / fmap() in Irken
add section on Hetrogeneous Lists
- Compile-Time Parametrics vs Runtime Row Polymorphism
- Existentially Quantified Types in Haskell