Skip to content

Raw stuff about the object model

Sébastien Doeraene edited this page Nov 14, 2012 · 12 revisions

Raw stuff about the object model

The contents of this page is still raw stuff gathered from mozart-hackers about the Object Model.

TODO Refactor the following (extracted from mozart-hackers mailing list)

Memory layouts for data types

Currently, the code of the new VM envisions three cases of type implementation.

  • The first one (exemplified by booleans and small integers) store it's value directly in the MemWord.
  • The second (exemplified by cons and cells) store its value in an object pointed to by the MemWord but conceptually part of the vm object.
  • The third one (exemplified by procedures and tuples) consist of an object followed by an array. The pointer in the MemWord points to the header object but can also be used to access elements of the array. This is better than storing a pointer to the array in the object (and thus using the second case) by using less memory (the extra pointer to array), less time (the second indirection) and having better caching properties (related data kept consecutive in memory).

A fourth case could be used for code areas which need two arrays (one of bytecode, the other of K registers) but for now, we use the third case and a pointer to the other array in the header.

Design decisions

Up to now I have described how VM objects are laid out in memory but not really how to use them. The design guidelines for this were simple. The most common cases, the code that has to be written by the people with the least knowledge of the VM had to be the easiest and most natural to write. Ranging from most common/easiest to most uncommon/hardest, we have:

  • Writing new builtins. This has typically to be done by an Oz user wanting to interface her/his program with some C/C++ library. That user has knowledge of Oz (so ideas such as exceptions or waiting are not alien) and some knowledge of C++ (to use the library at least) but nothing specific to the Oz VM.
  • Writing new types, new interfaces. This needs more knowledge of the VM, typically the kind of information you are reading right now. This shouldn't be too difficult but will require to know about the vm object model and some of its peculiarities.
  • Writing the vm object model itself. That is convincing C++ to behave in a way which isn't natural at all so that the two previous points are easy. This is quite difficult but is mostly done now. Normally you wouldn't have to dig into this until you become an Oz VM guru.

Writing builtins

To illustrate, let's say I want to write a builtin that takes two values and return a boolean indicating whether their sum is 8 or not. (Sorry for the insipid example but we haven't much in store for now.) I'll call that builtin bar and put it in module Foo. The code should look like this:

class ModFoo: public Module {
public:
  ModFoo(): Module("Foo") {}

  class Bar: public Builtin<Bar> {
  public:
    Bar(): Builtin("bar") {}

    void operator()(VM vm, In left, In right, Out result) {
      // Add left and right and store the result in sum
      UnstableNode sum = Numeric(left).add(vm, right);

      // Test if the sum equals 8
      bool res = IntegerValue(sum).equalsInteger(vm, 8);

      // Make the result and return
      result = build(vm, res); // or, equivalently, Boolean::build(vm, res)
    }
  };
};

We believe this is very natural. And yes, it does take care of type errors, waiting on variables and making them needed, etc.

Writing interfaces

Making new types isn't very difficult either. As can be seen in the example above, all operations (except creation) are handled through interfaces. It therefore makes sense to start by designing one or several interfaces for the new type. An interface (like Numeric) isn't directly defined by the developer as it would be very repetitive to write for it is the place that handles, for example, dispatching of calls following the dynamic types, as well as waiting on transients, and even performing reflective calls if the entity is reflective. Instead, one writes a class Interface<Numeric> which is formally a specialization of the Interface<T> template. This is enough for the code generator to know that it has to generate the interface Numeric. The behavior of the code generator is also influenced by "parent" classes of Interface<Numeric> that will be described later.

Inside the Interface<Numeric> class, one can define methods. They correspond to the methods that will be available in Numeric, but they receive an extra first argument of type RichNode. Contrary to interfaces in most object models, interfaces in the VM object model do contain code for the methods they define. This code gets executed typically if the actual type of the VM object on which this interface is used doesn't specifically implement this interface. The extra first parameter corresponds to that object itself and can be used to use another interface that might exist or to put it in an error message, etc.

The pseudo-parents that are of use to the generator are the following:

  • Specializations of the ImplementedBy<T...> template class. This declares data types that specialize this interface. This is also a peculiarity of this object model but it allows for faster code. A mechanism for data types to declare the interfaces they implement will be available later but won't be as efficient. For the basic data types, this is thus the way to go.
  • NoAutoWait. This deactivates some magic in the generator. To be more precise, when a method in the interface returns a recognized C++ type (currently only OpResult is supported) and the actual type of the object doesn't implement this interface but is marked as transient (an unbound variable for example) the generator will make the method wait on that object rather than execute the default code. This is most often the desired behavior. If an interface wants to deal with transients in the default code, it just has to add this pseudo-parent.
  • NoAutoReflectiveCalls. This deactivates some magic in the generator, preventing it from generating reflective calls.
  • Other classes may be added if needed in the future.

Writing data types

Data types are also defined in a DSL. To define the data type Foo, one writes a class Foo inheriting from DataType<Foo>. Fields and methods are defined like in any other C++ class. Methods can have a special first argument named self of type RichNode (but it can be omitted if not necessary), This argument gives access to the node containing the value of this data type. It can be put inside error messages, or be used to apply the become operation on the entity.

Similarly to interfaces again, implementations can use pseudo-parents. I'll get to all of them later but two are important enough to discuss them now. These allow to specify how this implementation type is stored. We saw earlier the three different ways: as a value directly in the node, as a pointer to an implementing C++ object, or the latter accompanied by an array.

An implementation can be stored directly in the node by deriving from StoredAs<Bar> where Bar is a type that can be stored directly in a MemWord such as double or bool. An implementation can be stored as a (pointer to) header and array by deriving from StoredWithArrayOf<Bar>. In this case, the header will be a Foo and there will be an array of Bar's after it. Finally, by not inheriting from StoredAs<U> nor StoredWithArrayOf<U>, a data type can simply be stored as a pointer to an instance of Foo itself.

The Foo::build(VM, ...) static method, which is defined in the superclass DataType<Foo>, will create a VM object by calling either a static method Foo::create(U&, VM, ...) (for types based on StoredAs<U>) or a constructor Foo(VM, ...) (for the others) that will receive the same parameters as the call to build().

The other pseudo-parent classes are:

  • Transient. This marks this data type as a transient, subject to the magic described above in interfaces.
  • WithStructuralBehavior. For values that have structural equality and unification (e.g., Tuple).
  • WithValueBehavior. For values that are a degenerated case of the above, with no aggregated nodes. With simpler words: values (e.g., SmallInt).
  • WithVariableBehavior<prio> where prio is an 1-byte unsigned integer. For transients that have binding opportunity. I will not talk about prio here.
  • BasedOn<U>. This makes the RTTI class for T (i.e., TypeInfoOf<T>) derive from U rather than TypeInfo. This is very low-level as it requires knowing more of how the thing is working but is currently required for using the next pseudo-parent.
  • NoAutoGCollect. Normally the garbage collector clones values keeping their types, using the GR constructor. It means that one has just to write a GR constructor (or GR pseudo-constructor create()) taking a GR as the only extra parameter (after the VM and possibly the size of the array). If one wants lower-level access to the GC process, this pseudo-parent can help by not defining the low-level gCollect() method in the type. It can therefore be supplied by a parent class given to the BasedOn<U> pseudo-parent.
  • NoAutoSClone. Similar to NoAutoGCollect for space cloning. It prevents the generator from writing the sClone() methods in the type.
  • Again, more are to come when needed.

Derniers conseils

The generator, being based on CLang's parser, needs a correct C++ input. This means in particular that the classes that are to be generated cannot be used in the code that the generator has to use. Because of this, we split declarations and definitions in separate files. But because we want to maximize inlining, they are not a <foo>.hh and a <foo>.cc but a <foo>-decl.hh and a <foo>.hh. There is an implicit rule stating that if you include a -decl.hh file, you have to include the .hh file as well. The contents of <foo>.hh, must be hidden from the generator using the preprocessor condition #ifndef MOZART_GENERATOR.