Skip to content
Sébastien Doeraene edited this page Mar 15, 2012 · 1 revision

Build process: how it works

The build process for the Mozart VM and compiler is rather complex. One of the most tricky things is to deal with the bootstrap compiler. Since the complete Oz compiler is written in Oz itself, we need a bootstrapping compiler written in another language.

For some insight on this issue, and some background on tombstone diagrams, read this.

TODO Refactor this (extracted from mozart-hackers)

Regarding compilation of the VM, I made two tombstone diagrams to document the way it'll be built. Of course you'll just have type mkdir build && cd build && cmake .. && make to build it but behind the scenes, lots of steps will happen.

The first diagram is more detailed, showing all the intermediate results while the second is more synthetic but the content is the same.

It also makes clear the exact extent of the CLang dependency. We can compile the VM with clang as the only C++ compiler but we can also compile using GCC. The only steps requiring CLang are to generate the boilerplate C++ source files.

In this VM, code generation is used extensively, sometimes using C++ constructs as a DSL for code to generate. By using the CLang parser to read the C++, we benefit from having a parser that understands all of C++ and not just a subset like most tools do (swig for example suffers from this), albeit at the price of requiring sources to be correct even before code-generation (no use before declaration, etc.).

Documentation for the generation process is still mostly in the form of source-code :-( but now that the design is settling, we will document it more thoroughly.

Detailed explanation of the full diagram

llvm & clang

On the top, and just below on the left, we have the compilation of clang libraries and the clang++ compiler. This is actually a simplification of a much bigger process that builds dozens of libraries and executables, generating parts of its code along the way, etc. All in all, this takes more than an hour on my laptop when done from scratch but the only parts we really care about are the clang++ compiler and the headers and libraries used by our generator.

Generator and code generation

The generator (sources) is a rather simple program. It uses the clang API to process the AST of the headers of the VM, looking for certain class names that we use to signal the behavior we want from the generated boilerplate code implementing the [VM object model](Object Model).

Once the generator is built, we use it to generate the "vm built sources". This is a two-step process. We first use the clang++ compiler to generate the AST. It just dumps its AST in a serialized format that our generator will be able to read back. Of course this contains a lot of information (such as all the declarations in the system headers that we include) that we don't care about, but it's the price we pay for having a completely correct interpretation of C++.

VM library

Now that we have all the sources for the vm, we build it as a library and we use this library to build a VM. The VM is a simple wrapper around the library, providing it with all the system specific ways to fit in the environment (files, sockets, environment variables, command line, GUI, etc.)

We hope to have a limited VM based on posix for testing and a more full-featured that would use JNI to delegate all this to a JVM (as a symbiotic VM).

Bootstrap compiler and compiler

The bottom-left part... We start by compiling the bootstrap compiler. This compiler transforms Oz code into a C++ program (with a main() function) that creates a VM executing the Oz code that was given as argument to the compiler.

The result is a kind of VM, that rather than loading the program to run from a .ozf file, has a hard-coded program. Eventually, when given the source of the current Oz compiler (modified for the new VM of course...) it should give a program (ozc1 in the diagram) that is a hard-coded Oz to .ozf compiler, producing good-quality code (as it is generated from the good quality current compiler) but itself very slow (as it is generated by the 'naive' bootstrap compiler). This compiler is therefore not suitable for use by end-users but is able to produce such a compiler by compiling one more time the Oz compiler.

This means we will have had three different executable Oz compilers: the first (bootstrap) is a .class that generates inefficient bytecode encapsulated in C++ code. The second (ozc1) is a native executable that is inefficient but generates efficient bytecode in a .ozf format. The last is an .ozf file that is efficient and generates efficient programs and that is available as part of the Mozart library and used to compile the rest of the world...