Skip to content
David Jeske edited this page May 22, 2017 · 18 revisions

by Sam Rushing

Why have I written this compiler?

Through my experience with Stackless Python and scalable concurrency, I've learned a green-threaded VM and a dynamic language is not enough to build world-class systems. There are often performance critical sections that dictate dropping down into a fast natively compiled language to escape the dynamic overhead of Python. Writing C extensions for Stackless Python in continuation-passing style is not a job for mortals; and to write CPS code in C without introducing bugs that impact stability and security is even harder... a job for the Gods themselves.

With this in mind, Irken's original purpose was to write a high performance VM for a Python-like language, and drop down into Irken for extra performance when needed. Irken's compiler automatically threads code into continuation passing style, which is nearly identical to what someone would write by hand when implementing a threaded virtual machine. Look here to get an idea what an Irken coded VM would look like.

Along the way, Irken has become and more and more interesting language in its own right.

Irken supports full-blown continuations. This enables the design of massively scalable systems built on user/green/non-preemptive threads. The engineering goal is to support 100,000+ threads. This goal cannot be efficiently reached with a runtime that uses pre-allocated C stacks for each thread, so we avoid using the C stack completely.

Irken's typesafe garbage collection, type-inference, and row polymorphism enable a Pythonic coding style, while also providing compile time static type checking, optional type declarations, and close-to-C compiled performance. This offers the possibility to efficiently write the entire system in Irken.

I'm busy working on fleshing out the pieces to make that happen. FFI, socket libraries, better compiler errors, an OO programming model, and debugging support are all on the very long list of things to do. With luck, we'll get there.

The First Compiler

Irken began life as a straightforward implementation of Scheme written in Python. One of my favorite Scheme implementations is scheme48, which uses a similar implementation technique - a restricted version of Scheme called 'PreScheme' that uses type inference to cut down on runtime type checking as much as possible. I decided to try to learn about type inference. After about 18 months, I finally had something working well enough for my purposes.

This first compiler was a whole-program Irken compiler that generated a single C file, in fact, a single C function, to represent the Irken program. The compiler translated Irken source to a core lambda-calculus language. After typing and optimization, the program was transformed into a CPS register-machine language before the final output to C.

The C output relied critically on two gcc extensions: the ability to take the address of a label, and inline/lexical functions. Other C compilers support the address-of-label extension, but fewer support lexical functions. The garbage collector was implemented as a lexical function to give it access to local heap-related variables, while still allowing the compiler to put them in registers.

The Current Compiler

The current compiler is written in Irken, and self-hosts with three backends. In 2013, the C backend was redesigned to use SSA-style output. Rather than relying on the gcc 'address-of-label' extension, it now relies on proper tail call optimization. This made an LLVM backend possible, which was finished in late 2016. Early in 2017 a VM and bytecode backend were written. All three backends share the same runtime.

The LLVM backend is able to self-compile, but there is much work left to be done to take advantage of its features like intrinsics, overflow detection, JIT, etc. At this time the performance of LLVM-compiled output is nearly identical to that of the C backend.

The VM/bytecode backend runs about 3-4 times slower than the others, but compile times are much faster. The current plan is to implement features like profiling, heap profiling, debugger, etc. in the VM rather than trying to support those things in all 3 backends.

The compiler now ships with a bytecode bootstrap image rather than a pre-compiled C bootstrap.

Since you made it this far, you might be interested in reading some of the entries on my blog, http://alien.nightmare.com/.


Next: Footnotes