Skip to content

Engine data structure restrictions

Ivan Mogilko edited this page Oct 6, 2016 · 2 revisions

Summary

In the AGS engine certain data structures were originally restricted from any change by three circumstances:

  • Serialization process,
  • Compiled scripts and their interpreter,
  • Plugin interface.

I will overview and explain these, as well as potential solutions for each case.

Serialization

Originally, in the old engine code, as well as in the Editor's game compilation code, many structures were written to file as raw memory, e.g.:

struct GameEntity
{
    [some member variables...]
};

GameEntity object;

fwrite(&object, sizeof(GameEntity), 1, file);

This kind of serialization had following issues:

  1. All data was saved keeping memory endianess; original running platform was strictly little-endian.
  2. Data structure alignment. If the data struct had internal padding in memory, that padding got written to the file as well.
  3. Useless data. Sometimes the data struct could hold pointers to other objects. Their values was written into file, although having no meaning, except indiciating that particular pointers were adressing something at the time of serialization.
  4. Virtual table pointer. Some of those structs were actually part of class hierarchy with virtual methods. In these cases the pointer to virtual methods table was saved to file as well (first 32-bits, according to originally used compiler).

In the latest version of the engine all of those four issues are solved, but they should be kept in mind as long as you support reading old game data formats (new format versions should not have those reintroduced). All of (de)serialization is now done by writing and reading each variable distinctively.

  1. Class DataStream is making little-to-big-endian conversion of data it reads if necessary. Of course, it is important to specify which type of variable you expect to be loaded/saved (byte, int16, int32, array of int32, and so on) for this conversion to be effective.
  2. Class AlignedStream deals with reading padded structs from file. It is configured to match the padding rules of original compiler used to make AGS releases on Windows (C++, x86), deduces padding required as it reads data and skips it to avoid loading redundant bytes. When using AlignedStream, as with DataStream, you should specify exact types of variables you load/save, otherwise it won't be able to skip padding correctly.
  3. Pointer values are either skipped completely, or converted into temporary boolean variables meant to indicate presence of another serialized object further in the stream, which should be referenced from current data struct.
  4. Virtual table 32-bit pointer is simply skipped where necessary with comment added for clarity.

Engine memory exposed to scripts

Originally AGS engine exported its memory to scripts without any protection. This worked in following way:

  • AGS script is compiled into something like x86 assembly code, which has simulated operations like reading and writing memory at given address with certain offset.
  • AGS built-in script header declared engine API, which included certain global objects and arrays of structs (such as character array), and included enough information to calculate their sizes.
  • When script was compiled, the compiler constructed read/write operations refering to the sizes of declared structs and offsets of their variables. Those offsets were "hard-coded" into compiled script. For example, it could instruct to "read 4 bytes of object's data at offset 48".
  • At the runtime, actual pointers to engine functions and global variables were registered in the exports table under certain IDs ("script names").
  • When script interpreter ran script, and received a request to read, write or call an exported object, it did no checks to find out if such operation is safe. It simply got the demanded pointer from the exports table and performed required operation on it.

What this means is that:

  • If contents of the exported struct are changed, all the previously compiled scripts that referenced that struct will break;
  • If there is an array of structs exported to script, and size of that struct changes, then the previously compiled scripts that worked with elements of that array will break;
  • If the function argument list of exported function is changed, the previously compiled script that call this function will break.

By "break" I mean anything from game not working correctly to engine memory got corrupted.

When we started to improve AGS engine, we rewrote script interpreter to make it safer and better control the variable and function access. For every exported static global object there is a ICCStaticObject implementation that controls access to object's memory. For global arrays that is StaticArray class. For types which may have pointers created to them at runtime in script, there are ICCDynamicObject implementations. Finally there is CCDynamicArray class that deals with dynamic script arrays of basic types or managed pointers.

Instead of directly reading or writing the memory on script's command, interpreter now calls corresponding methods of those manager classes, and lets them decide what to do in every particular case. They are methods as: ReadInt8, ReadInt16, ReadInt32 and their Write* counterparts. What those methods are supposed to do is to take backwards compatible memory offset (given in bytes) and deduce which object's variable should be read or assigned.

In most simple case that looks like (sample code):

void MyScriptClass::WriteInt32(const char *address, intptr_t offset, int32_t val)
{
    switch (offset)
	{
	case 0: managedEntity.firstVariable = val; break;
	case 4: managedEntity.secondVariable = val; break;
	case 8: managedEntity.thirdVariable = val; break;
	// and so on...
	}
}

But ofcourse this also allows to perform additional processing. You do not have to read or assign variable directly, you may, for example, call a function instead.

That said, at the moment of writing this post we have the control system in place, but most of these control methods are stubs which still use unsafe direct memory access. It is of utmost importance that the proper solution is implemented, which would let developers to modify engine data types and variables without restriction. That should take medium amount of time and effort to complete this task.

All this is only important for previously existing types and variables that were exposed in script API. New script types do not have to support those methods. It is highly advised to not export raw variables and arrays in script directly, but declare properties and functions to work with them instead.

Engine memory exposed to plugins

TBD