Skip to content

Rationale for the object model

Sébastien Doeraene edited this page Mar 15, 2012 · 2 revisions

Rationale for the object model

This page describes the rationale of the Object Model, aka its history, how we came to design it the way we designed it.

An idealized object-oriented design?

In an idealized object-oriented world, if one starts to think about how the Oz VM should be implemented (without looking at the actual code), it all seems very clear. One can see a big object-oriented hierarchy with Variable on top and children UnboundVariable and Value down to Cons inheriting from Tuple inheriting from Record, etc.

However, despite all the lectures about object-orientation and why it's nice and everything, it is not the solution for designing the VM object model.

First, inheritance is inappropriate. The only advantage of a Cons class is that it doesn't need all the internals of a Record or Tuple (no need to store an arity and a label as they are implied). This, in turn, means that the big hierarchy should be one of interfaces with classes implementing them and without much code reuse.

Second, dynamic cast errors are inappropriate. Calling methods cannot be done by simply casting to the required interface and calling. Oz-level errors (like doing an @ operation on a record) are not errors at the emulator level, but regular operations that trigger an Oz-level exception. Doing that would require protecting all method calls to trigger an Oz exception if the dynamic type of the object was incorrect. That is not even enough as different objects apparently not implementing the interface (such as an unbound variable and a record for the @ operation) require different behaviors.

Finally, OO has not support for type mutations. Consider how variable binding and unification has to work. Variable binding requires, e.g., that an object which is an UnboudVariable become, say, a Tuple. Java (or C++ for that matter) doesn't provide an easy way for an object to change type... Unification of unbound variables is problematic too. Short of some magic to update all the references/pointers to one of the variable (which would also work as a solution to variable binding), one of them had to become a "forwarder" to the other (with similar problems to binding), making method invocation even more painful as it would have to deal with these forwarders all the time.

At this point, it should be clear that the VM object model is quite different from the Java or C++ object models. We need all methods to be callable on any object with a more complex default behavior than a host language exception. We also needed support for dynamic (runtime) type changes. And we need support for transparent forwarding.

Why OO does not support type mutations.

There are at least two reasons why traditional object models do not support dynamic type mutations. First, it would easily break assumptions by the programmer and compiler. A variable declared of a certain class could be of a different class at some point so that no devirtualization of the call would be possible. Second, if the new type (after mutation) requires more space than the old one, the mutation would require moving the object (magical pointer update) or corrupting objects stored after it in memory.

The first reason doesn't really apply to us as all methods are callable on any object, and indeed we will always have to check the current type of the object. The second reason is worked around by forcing all objects to be of the same size. This is implemented very similarly to the PIMPL idiom (which is also there to keep the size constant but in that case for API & ABI compatibility between library version).

Our model

A fixed size for objects. In our VM object model, each VM object is made of two memory words, hence its size is 64 or 128 bits depending on the bit-width of the platform. This first memory word specifies the type of the object, and the second word specifies its contents. Usually the latter is a pointer to the actual contents, because a single word is not enough (but not always, e.g., small integers can be stored directly there).

Storing objects inside bigger objects. As VM objects are so small (two words), it does not make much sense to store pointers to them in most cases. If we take the example of a Cons, it is a two-word object. The first indicates it's actually a Cons, the other is a pointer (as no matter how hard we try, we cannot put both the head and tail in a single memory word) to some structure containing the head and tail. This structure could contain two pointers to VM objects but very often these VM objects wouldn't be referred to by anything else and this design would therefore be less efficient (in speed and memory) than directly storing the two VM objects in the structure.

In a similar way, a VM object for a Cell will contain a pointer to some structure with the details of the Cell, such as the computation space it belongs to or its global name, and a VM object with the contents of the cell.

Forwarding. The need for forwarding was elicited earlier. Sometimes we need an object to be an alias of another object. The model supports this easily. When an object A needs to be an alias to an object B, we set its type to a special type Reference, and the the second word is a pointer to B.

Stable and unstable objects. As similar as they look, Cell and Cons are very different with regard to forwarding. Of course, pointing to the VM object inside a Cell is wrong, because each time the cell changes content, the object made into a reference would change value. However, it is valid to point to an object inside a Cons. It is even desirable, as this reduces memory usage.

We make this inherent difference explicit, naming them respectively an unstable object and a stable object. Reference's are only allowed to point to stable objects.