Skip to content

dependently typed programming language with timeless referential transparently

License

Notifications You must be signed in to change notification settings

imuli/candid-wanderings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

candid

Dependently typed programming language with timeless referential transparently.

We functional programming language aficionados enjoy our referential transparency. We point to our functions and say "here, I know that whenever I see this expression it will always return the same value". Then you want to use some package that requires a newer version of some other package, and suddenly your code doesn't even type check much less return the same value. Or there's a new version of the programming language, and old code doesn't compile on it anymore.

This is an attempt to fix all of that, by reworking a number of assumptions that we make about the mechanics of programming.

Inspirations

Morte is a minimalist implementation of the calculus of constructions. Every type checked program is total, and thus will terminate in finite time. Morte uses this property to optimise it's input by evaluating it to the extent possible. Morte doesn't do any type inference, just type checking and that is complete, so any complete set of code that compiles now will always compile.

Unfortunately, Morte cannot be implemented in itself, not being total. Morte is also based on the traditional text file full of names in the write-parse-type check-compile loop.

Unison is much more of a traditional functional programming language is some respects, with unbounded recursion and type inference. However, it's purpose is to ease distributed computation, and it needs to be able to send functions to other nodes to evaluate, and know that they have all the same pieces.

To this end, Unison identifies functions by a hash if it's content. If you have a x = x b and you make a new version of b, the system will need to make a new version of a also. However, if you have the hash of a, there's no changing it. The hash will always point to the same function.

To this end, Unison strives to do away with the notion of textual source code. Instead of editing a text file, the goal is to be editing abstract syntax trees directly. This introduces quite a shift from the traditional programming process, there is no more parsing of source code.

Candid

Both these systems have a level of timeless referential transparency. Morte with it's check-only type system, and Unison with it's immutable hash-tree codebase. Yet if you have a Morte URL reference, you can't be sure it's the same data without keeping extra information. And how strong is the guarantee that Unison's type inference won't ever change?

Candid strives to do both, given a hash, it will always expand to the same code, and if that code type checked before, it will type check to at least the same degree later, and it will evaluate to the same result (given the same inputs of course). There are no circumstances where the compiler is guessing at what the programmer means, so there's no leeway to change what assumptions the language makes.

The Calculus

One might represent Candid's (and Morte's) inner calculus something like this:

data Expr
  = Lam (inType : Expr) (body : Expr)    -- inType ⇒ body
  | Ref (n : Nat)                        -- n
  | App (func : Expr) (arg : Expr)       -- func arg
  | Pi (inType : Expr) (outType : Expr)  -- inType → outType
  | Star                                 -- *
  • Lam encloses a function, apply an argument (type inType) and it returns body with references to it replaced by that argument.

  • Ref refers back n arrows, to a Lam or a Pi, to take the place of an argument on function application.

  • App applies the argument arg to the function func.

  • Pi is the type of Lam, where inTypes match and outType is the derived type of the body.

  • Star represents a type, informally this includes the type of Star and a Pi with an output of Star - though reality is more complicated.

With these one may define functions:

id      = * ⇒ 0 ⇒ 0

id is the simplest function, it simply returns it's argument. However, as every Lam requires a input type, it takes an additional argument first that specifies the type. Candid replaces polymorphism with specialization. Notice also how the two 0s refer to different arguments, the first one refers the first argument (which is a type) and the second to the second (which is a value of that type).

Unit    = * → 0 → 1

Unit is the type of id, so named because id is the only one function of that type. Here both 0 and 1 refer to the first argument, the type of the value given to and returned by id.

Boolean = * → 0 → 1 → 2
True    = * ⇒ 0 ⇒ 1 ⇒ 1
False   = * ⇒ 0 ⇒ 1 ⇒ 0

Boolean has exactly two inhabitants, True and False, which one is which is purely arbitrary, but, for our purposes, True returns the first value and False returns the second, and thus function application:

or     = Boolean ⇒ Boolean ⇒ 1 Boolean True 0
and    = Boolean ⇒ Boolean ⇒ 1 Boolean 0 False

or takes two Booleans uses the first one to choose between True and the second. Sugaring this to something more familiar:

or : Bool -> Bool -> Bool
or a b = case a of
              True -> True
              False -> b

and could be sugared in exactly the same way.

Both are of the type Boolean → Boolean → Boolean. Note that above we don't make any allowances to refer to things by name in the data structure up above. Boolean here is just a placeholder for * → 0 → 1 → 2.

For more examples of what you can do with this sort of minimal calculus of constructions, albeit with names instead of numbers, see the Morte Tutorial.

Hashes

  | Hash (hash : Vect 8 Bits32)

This is the first thing (other using natural number indices rather than names) that differentiates Candid from Morte. Boolean can also be represented by the 256-bit hash 941ad446211f88e385d2b95be6336328b1708e77a11dca3e783a240d9dfadca5. Hashes replace well-typed closed sub-expressions, and indeed, introduce no semantic differences what-so-ever, but are like a provable form of Morte's URL references. Hashes are the basis of Candid "source code" as a pure immutable data structure, a directed acyclic graph (albeit representing a cyclic one).

With just the above Expr type, we can only represent total functions, and only total functions at that. This is by design, and is very helpful in writing correct code. However, this also means we can't implement Candid in Candid.

Recursion

  | Rec (n : Nat)                        -- @n
  | Type (outType : Expr) (body : Expr)  -- outType | body

So we get primitive recursion. Rec is a lot like Ref, but instead of being replaced with the argument, it is replaced with the function. Type is here to assist the type checker, so it can calculate the output types of a Rec.

Nat = * → 0 → (@1 → 2) → 2
Zero = * ⇒ 0 ⇒ (Nat → 2) ⇒ 1
Succ = Nat ⇒ * ⇒ 0 ⇒ (Nat → 2) ⇒ 0 3
add = Nat ⇒ Nat ⇒ Nat | 0 Nat 1 (@1 (Succ 1))

This is the recursive version of Nat, which sugars to:

data Nat = Zero | Succ Nat
add : Nat -> Nat -> Nat
add n m = case m of
               Zero -> n
               Succ p -> add (Succ n) p

Now all (or nearly all) the interesting things to do with Nats can be done without this sort of recursion using Nat = * → 0 → (1 → 2) → 2 on which operations are provably total by dint of type checking. So consider...

Expr = * →
       0 →
       1 →
       (Nat → 3) →
       (Nat → 4) →
       (@4 → @5 → 6) →
       (@5 → @6 → 7) →
       (@6 → @7 → 8) →
       (@7 → @8 → 9) →
       8

Editor

Notice that there're a lot of handy features missing here. No variable names or function names, just numbers and hashes! No type inference, type classes, or implicit arguments! All that bookkeeping overhead passing all these types around! Are you sure this isn't one of those esoteric languages? Well, that's where the editor comes in.

Presented above is the data structure, how code in stored inside the compiler, the interpreter, the code store, and the editor. It is, at the same time, quite verbose and yet terse and inscrutable. Therefore editors also store names to give functions and arguments, comments, presentation data, everything they need to transform that data structure into something humans can work with.

Editors can hide any type annotation that is normally implicit and inferred, they can infer that type information to start with. They can present numbers with digits instead of as functions, gave names to your functions and arguments, expand and collapse expressions, pretty print the math, and automatically derive sufficiently constrained functions by their type.

By editing directed acyclic graphs rather than blobs of text, there are no syntax errors, no reformatting, type checking may be done at edit time, compilation can be done incrementally at edit time, all valid type checked expressions are efficiently saved in the code store until garbage collection - long term undo history at the granularity of functional changes.

And we can still have all the magic that comes of parsing and syntactic sugar, of type inference and implicit arguments, but it'll be magic you can look at, magic you can change by tweaking your editor, not the compiler, transparent magic, magic revealed, candid magic.

Structurally Equivalent Types

One of the boons of using de Bruijn indices is that we get α-equivalence for free. That is, checking whether two expressions have the same structure, disregarding names. This also means that:

data Ordering = LT | EQ | GT

is exactly the same as

data Axis = X | Y | Z

and any other enumerated type with three values:

Ternary = * → 0 → 1 → 2 → 3
Zero = * ⇒ 0 ⇒ 1 ⇒ 2 ⇒ 2
One = * ⇒ 0 ⇒ 1 ⇒ 2 ⇒ 1
Two = * ⇒ 0 ⇒ 1 ⇒ 2 ⇒ 0

This means that our type checker loses the power to distinguish between Axis and Ordering. There are a number of ways to regain this power, and one requires nothing extra from the language, introducing a singleton into the type.

There are an infinite number of singleton types. * → 0 → 1 is the simplest and most obvious, but * → * → 1 → 2 is much the same. While one could go on forever just adding intermediate type arguments, consider the type:

* → * → * → * → (4 → 3 → 2 → 4 → 6) → 1 → 2

So long as the arguments to the function and the value argument are not all of the same type, there is obviously only one possible function with that type, in this case:

* ⇒ * ⇒ * ⇒ * ⇒ (4 → 3 → 2 → 4 → 6) ⇒ 1 ⇒ 0

With $n$ type arguments and $m$ arguments to the function, this yields $n^{m+2}-n$ unique singleton types, 4092 for the model above.

So to make a type unique, one simply prepends a unique singleton type.

Axis = (* → * → * → * → (4 → 3 → 2 → 4 → 6) → 1 → 2) → * → 0 → 1 → 2 → 3
X = (* → * → * → * → (4 → 3 → 2 → 4 → 6) → 1 → 2) ⇒ * ⇒ 0 ⇒ 1 ⇒ 2 ⇒ 2

As there is only one possible value for the singleton type, it can be filled in automatically (and hidden) by a sufficiently smart editor. During type checking this will ensure that X isn't used in place of LT or vice versa, and it will be optimized away with partial evaluation during compilation, producing the exact same code as Axis = * → 0 → 1 → 2 → 3.

Note this also allows one to specify a function:

(* → 0 → 1 → 2 → 2) ⇒ (* → * → * → * → (4 → 3 → 2 → 4 → 6) → 1 → 2) ⇒ 1

which turns an Ordering into an Axis, much like the lone constructor in Haskell's newtype.

Type Inference

If there is a hole in the expression, it is helpful to be able to infer the type of the expressions that fit in the hole. Though we call this type inference, it takes place in the editor rather than the complier and charges or improvements do not change any already written programs.

We start with an expression and a path to the sub-expression which we want to infer the type of, and an expected output type. In general we follow the path, changing the expected output type based on the root of our current expression.

  • Pi expects a type for both arguments.
  • Lam expects a type for it's type and anything for the body.
  • Type expects a type for it's type and that type for it's body.
  • App is a little more complicated. The type of the function must be a Pi from the type of the argument to the expected type. So if we're looking for that we need to figure out the type of the argument first. Conversely to infer the type of the argument we need to determine the type of the function, and we expect that function's input type.

When we reach the end of the path we simply return the expected type. Thus our result can be anything, any type, a value of some type, or a Pi from one of those to another. with this type pattern we can match or generate expression to fit in the hole.

What to do about Universes

Informally, is the type of and ★ → ★ and the like. In reality, stars are paramaterized over natural numbers, such that the type of ★ₙ is ★ₙ₊₁. This is to prevent inconsistancy due to Girard's Paradox.

In the current implementations, is either of type or (paradoxically) of type . Eventually there will be universe inference, which will result in some ultimately invalid programs that currently type check fail to type check. This is the sole exception to Candid's forward compatibility guarentee - if there is a bug in Candid that results in an inconsistant logic, programs that depend on that bug will not work in future versions.

The problem with is having of type is that one cannot use the conventional definition of Bool as a dependant type, and cannot even convert a value Bool to a type-level Bool:

Bool = t:★ → true:t → false:t → t
True = t:★ ⇒ true:t ⇒ false:t ⇒ true
False = t:★ ⇒ true:t ⇒ false:t ⇒ false

<s>x:Bool → x ★ Bool Nat</s>

Bool★ = true:★ → false:★ → ★
True★ = true:★ ⇒ false:★ ⇒ true
False★ = true:★ ⇒ false:★ ⇒ false

<s>Bool★fromBool = bool:Bool ⇒ bool Bool★ True★ False★</s>

With universe inference it just looks like this:

Boolₙ = t:★ₙ → true:t → false:t → t
Trueₙ = t:★ₙ ⇒ true:t ⇒ false:t ⇒ true
Falseₙ = t:★ₙ ⇒ true:t ⇒ false:t ⇒ false

x:Boolₙ₊₊ → x ★ₙ Boolₙ Natₙ

Programs for Free

Consider the type Map, which is the type of a map function for the type m:

Map = m:(★ → ★) ⇒ t:★ → s:★ → fun:(t → s) → val:(m t) → m s

For each type m there will be a few possible programs that satisfy that type. For instance given the type Maybe and it's constructors Nothing and Just:

Maybe = t:★ ⇒ ★
	s:★ → nothing:s → just:(t → s) → s
Nothing = t:★ ⇒ Maybe t
	s:★ ⇒ nothing:s ⇒ just:(t → s) ⇒ nothing
Just = t:★ ⇒ x:t ⇒ Maybe t
	s:★ ⇒ nothing:s ⇒ just:(t → s) ⇒ just x

Our map function will have the type Map Maybe:

Map Maybe = ★
	t:★ → s:★ → fun:(t → s) → val:(Maybe t) → Maybe s

We need a Maybe s, which we can get via Nothing s and Just s y, where y is of type s. so one obvious (and wrong) option is:

map' = t:★ ⇒ s:★ ⇒ fun:(t → s) ⇒ val:(Maybe t) ⇒ Maybe s
	Nothing s

However we also have fun:(t → s) and val:(Maybe t). Either val is a Nothing t, in which case our only option is to use Nothing s, or it is a Just t x, in which case we have have two options, Nothing s and (x:t ⇒ Just s (fun x)). This points to the actual map function between Maybes:

map = t:★ ⇒ s:★ ⇒ fun:(t → s) ⇒ val:(Maybe t) ⇒ Maybe s
	val (Maybe s) (Nothing s) (x:t ⇒ Just s (fun x))

The heuristic here is to prefer solutions that use more arguments. If the type specifies that we take an argument, we probably want to use it in our program. Note that, as all of this takes place in the editor, it is just as easy to present the user with ten suggestions as one. The heuristic can just be used to order the suggestions. On the other hand, if there is only one possible program with a particular type, the editor could just fill it in.

Sometimes though, there are more than just a few options.

List = t:★ ⇒ ★
	r:★ → nil:r → cons:(r → t → r) → r
Nil = t:★ ⇒ List t
	r:★ ⇒ nil:r ⇒ cons:(r → t → r) ⇒ nil
Cons = t:★ ⇒ a:t ⇒ list:(List t) ⇒ List t
	r:★ ⇒ nil:r ⇒ cons:(r → t → r) ⇒ cons (list r nil cons) a

The type t:★ → xs:(List t) → List t has countably infinite different inhabitants. At the simplest there are Nil t and xs, for a little more effort any other duplication and/or permutation of xs works too. Thus we cannot present every solution - rather we must take the first few results of a breadth first search for solutions.

Note that if there are infinite inhabitants of List t → List t than there are likewise infinite inhabitants of List t → List s. Note also that there are infinite programs equivalent to any other program, simply by encapsulating the program in an id. A properly defined breadth first search will relegate those solutions to later on, where they will usually not even be enumerated.

Complier

  • partial evaluation / specialization
  • automatic source fetching
  • extendible core data types
  • automatic core data types

Runtime

  • type system enforced capabilities
  • forward (and limited backward) compatibility via translation layer specialization

Unlicensed

This is free and unencumbered software released into the public domain.