Skip to content

Implementation of data types

Sébastien Doeraene edited this page Nov 14, 2012 · 12 revisions

Implementation of data types

This section explains how to write new data types in the Object Model.

A data type in the Object Model is what we usually call a class in a classical object-oriented language. It is a collection of fields and methods acting on these fields.

The skeleton

To get started with a new data type, you should copy and paste this skeleton. As a running example, we will explain how to write the Cell data type. This is a typical data type, because it is determined (IsDet), has a token identity (C1 == C2 if and only if C1 and C2 refer to the same cell), and has a fixed size (compare with an Array). Its only weird property is related to spaces, which we will exclude from the discussion for now.

You can put the following, e.g., in the file datatypes-decl.hh of the experiment application.

#include <mozart.hh>

namespace mozart {

//////////
// Cell //
//////////

// Stuff generated by the generator (don't worry about it now)
#ifndef MOZART_GENERATOR
#include "Cell-implem-decl.hh"
#endif

class Cell: DataType<Cell> {
public:
  // The standard constructor (invoked by Cell::build(...))
  inline
  Cell(VM vm, RichNode initial);

  /* The GR constructor (GR stands for Graph Replicator)
   * It is used e.g. for garbage-collection. It takes a VM and a GR, as well
   * as a reference to the Cell to be copied.
   */
  inline
  Cell(VM vm, GR gr, Cell& from);

public:
  // Any method you want, just as in a regular class
  // For example:
  
  inline
  UnstableNode access(VM vm);
  
  inline
  void assign(VM vm, RichNode newValue);
private:
  // Any field you want, just as in a regular class
  // For Cell we'll have:
  UnstableNode _value;
};

// More stuff generated by the generator
#ifndef MOZART_GENERATOR
#include "Cell-implem-decl-after.hh"
#endif

}

So what does this declaration says? At the C++ level, I guess you figured. It's merely a class Cell inheriting from DataType<Cell> (it is an instance of CRTP). At the Object Model level, however, it is much more meaningful!

This piece of code declares a new data type, called Cell. This data type is non-transient and has a token identity (the defaults). Its type identity can be accessed with Cell::type() (declared in the superclass DataType<Cell>. Moreover, it links this data type to a memory representation, which is the Cell class itself, a means to garbage-collect an entity of this type, etc.

Part of this magic is implemented by a rather clever type system over the C++ type system, written as a collection of (variadic) template classes in the core object model headers (memword.hh, storage-decl.hh, store-decl.hh, type-decl.hh, typeinfo-decl.hh and datatype-decl.hh). The rest of the magic is just generated automatically by a clang-based generator.

Most of the things generated, you need not care about. They are true boilerplate. This boilerplate will essentially provide a data structure containing the RTTI (RunTime Type Information) of the data type. RTTI are stored in subclasses of TypeInfo.

Note that, as is the case for most routines and methods using the VM Object Model, all the methods of Cell take a first argument which is a VM vm. Similarly, input Oz values are typed as RichNode, whereas output Oz values are typed as UnstableNode.

This class actually defines the behavior of your data type, entirely. Its memory layout as well as the operations you can call on it.

The implementation of the constructors, as well as any non-trivial method, should be put in a file named datatypes.hh. The entire contents of this file must be hidden from the generator, because it is not compilable without the sources that the generator generates. For the minimal skeleton we showed above, it should contain the following:

#include <mozart.hh>

#include "datatypes-decl.hh"

#ifndef MOZART_GENERATOR

namespace mozart {

//////////
// Cell //
//////////

// Even more stuff generated by the generator
#include "Cell-implem.hh"

Cell::Cell(VM vm, RichNode initial) {
  _value.init(vm, initial);
}

Cell::Cell(VM vm, GR gr, Cell& from) {
  gr->copyUnstableNode(_value, from._value);
}

UnstableNode Cell::access(VM vm) {
  return { vm, _value };
}

void Cell::assign(VM vm, RichNode newValue) {
  _value.copy(vm, newValue);
}

}

#endif // MOZART_GENERATOR

Again, here you can implement the methods as in any regular class. In the regular constructor, here we initialize the cell's value with the initial value provided as parameter.

The GR constructor instructs the graph replicator that it should replicate from._value into _value. How it does it, you need not know at that point. Just make sure that, in your GR constructor you:

  • Use gr->copyStableNode and/or gr->copyUnstableNode to copy nodes,
  • Use gr->copySpace to copy SpaceRef's,
  • Use gr->copyThread to copy Runnable*'s,
  • Use gr->copyStableRef to copy StableNode*'s,
  • Use the regular assignment operator of C++ to copy any other data (int, bool, etc.).

The generator

Now you have what you should write by hand. But there are still parts of the code that are missing. The generator will write them for you, but you need to instruct him to do so.

I will not expand on this now. In the experiment application, we have set up CMakeLists.txt so that the generator is run automatically on customlib.hh, which includes datatypes-decl.hh. Hence, you need not worry about it.

When working in the core aspects of Mozart, you should just modify coredatatypes-decl.hh (resp. coredatatypes.hh) so that it includes your files, e.g., cell-decl.hh (resp. cell.hh).

How to use your new data type

Now that you have defined your brand new data type, you'll want to use it. You may never instantiate a Cell directly. You must always go through the Cell::build() method to do so. Using the as<Cell>() method, you may call any public method of Cell through a rich node.

#include "customlib.hh"

auto initial = SmallInt::build(vm, 5);
auto cell = Cell::build(vm, initial);

auto contents = cell.access(vm);
cout << repr(vm, contents) << endl; // displays 5

auto newValue = SmallInt::build(vm, 42);
cell.assign(vm, newValue);

contents = cell.access(vm);
cout << repr(vm, contents) << endl; // displays 42

Memory layout

The machinery of the Object Model takes care of all the details of memory management. But if you care about how exactly a node of type Cell behaves in memory, then you simply have the following: the first word in the node is Cell::type(), and the second word in the node is a Cell*. It points to an actual instance of Cell in memory, i.e., to an area with an UnstableNode, i.e., 2 memory words.