Skip to content

Serializer Deserializer Code generation

Amnon Heiman edited this page Nov 8, 2017 · 43 revisions

Introduction

The serializer code generation uses a schema to create a serialize and deserialize function. It support schema upgrade with the condition that fields can only be added.

The code generator support concrete classes and lazy serialization of classes (see extended section). When adding a field a default value can be given to it in case it was not present due to an interaction with an older version (in this case the field will also be marked with a version attribute).

Adding a serializer/deserializer to current class

The code generator creates a serializer and deserializer function for an object. It is based on an existing object. To add a serializer do the following:

  1. Make your object/s ready for serializiation when serializing an object you need to have the following:
  • a getter method for each field need serializing or that the field will be public.
  • A constructor that gets all the fields in an order specified by the idl, when adding it make sure it uses a move operator when applicable.
  • Nested class are currently not supported - take the deceleration out of the class and replace it with using
  1. Add an idl file under the idl directory. It should contain the classes struct and enum your object needs. If your object needs some general object, like ip_address, check before adding, it might be already there.
  2. Add the idl file to the idls list in configure.py
  3. Add a specific implementation to messaging_serivce

Schema

  • The schema we use similar to c++ schema.
  • Use class or struct similar to the object you need the serializer for.
  • Use namespace when applicable.
  • Since idl compiler does not have c++ type deduction capability always specify full type names.

keywords

  • class/struct - a class or a struct like C++ class/struct can have final or stub marker
  • namespace - has the same C++ meaning
  • enum class - has the same C++ meaning
  • final modifier for class - when a class mark as final it will not contain a size parameter. Note that final class cannot be extended by future version, so use with care
  • stub class - when a class is mark as stub, it means that no code will be generated for this class and it is only there as a documentation.
  • version attributes - mark with version id mark that a field is available from a specific version
  • template - A template class definition like C++

Syntax

Namespace

namespace ns_name { namespace-body }
  • ns_name: either a previously unused identifier, in which case this is original-namespace-definition or the name of a namespace, in which case this is extension-namespace-definition
  • namespace-body: possibly empty sequence of declarations of any kind (including class and struct definitions as well as nested namespaces)

class/struct

class-key class-name final(optional) stub(optional) [[writable]](optional){ member-specification } ;(optional)

  • class-key: one of class or struct.
  • class-name: the name of the class that's being defined. optionally followed by keyword final, optionally followed by keyword stub
  • final: when a class mark as final, it means it can not be extended and there is no need to serialize its size, use with care.
  • stub: when a class is mark as stub, it means no code will generate for it and it is added for documentation only.
  • writable: An optional attribute that state if writers and visitor views will be created for the class
  • member-specification: list of access specifiers, and public member accessor see class member below.
  • to be compatible with C++ a class definition can be followed by a semicolon. ###enum enum-key identifier enum-base { enumerator-list(optional) }
  • enum-key: only enum class is supported
  • identifier: the name of the enumeration that's being declared.
  • enum-base: colon (:), followed by a type-specifier-seq that names an integral type (see the C++ standard for the full list of all possible integral types).
  • enumerator-list: comma-separated list of enumerator definitions, each of which is either simply an identifier, which becomes the name of the enumerator, or an identifier with an initializer: identifier = integral value. Note that though C++ allows constexpr as an initialize value, it makes the documentation less readable, hence is not permitted.

class member

type member-access attributes(optional) default-value(optional);

  • type: Any valid C++ type, following the C++ notation. note that there should be a serializer for the type, but deceleration order is not mandatory
  • member-access: is the way the member can be access. If the member is public it can be the name itself. if not it could be a getter function that should be followed by braces. Note that getter can (and probably should) be const methods.
  • attributes: Attributes define by square brackets. Currently are use to mark a version in which a specific member was added [ [ version version-number] ] would mark that the specific member was added in the given version number.

template

template < parameter-list > class-declaration

  • parameter-list - a non-empty comma-separated list of the template parameters.
  • class-decleration - (See class section) The class name declared become a template name.

Lazy serialization and deserialization (writable attribute)

The de/serialization functions operate on objects. There are situations when we want to create an object on the message without first hold it in memory.

For example, rows in mutations.

In mutation, rows are a collection of columns that are a collection of values. We don't have such a data structure, those values are gother when creating the mutations.

On the sender side, there is need to create the artificial object just for serialization and on the receiver side, maybe not all the values are even important and should not be deserialized at all.

For this kind of cases, we use writers and views.

To mark that an object should have writers and view we use the writable attribute, if this is just a logical concept for the message and does not have a concrete class, it should also be marked as stub

class collection_element {
    bytes value;
};

class column stub [[writable]] {
     int id;
     std::vector<collection_element> elements;
};

class row stub [[writable]] {
    std::vector<column> columns;
};

In the above, simplified example, collection_element is a class we use in the code but column and row are not.

The idl compiler will generate writers for row and column writer_of_row and writer_of_column. And views: row_view and column_view.

views

views are facade classes. The actual class can but do not have to exist.

In the mutation example, row is a stub class, the idl-compier will generate a view for it: row_view with a getter function for the column std::vector<column_view> columns() const;. Views are lazy evaluators, so the deserializatoin of columns will be done, only when the method is used.

writers

Writers are helper classes that serialize object incrementally. The concept is based on a state machine and validity is checked at compile time.

In the mutation example, to write a row we start with a writer_of_row object. row has only one member that is a vector, vector acts as an optional variable. So we can either adds elements to the vector calling start_columns() or skip it, calling skip_columns(). calling start_column() returns an object that lets you add objects to the vector. If ww is writer_of_row objecct. ww.start_column().add() will return a writer_of_column objecct.

IDL example

Forward slashes comments are ignored until the end of the line.

namespace utils {
// An example of a stub class
class UUID stub {
    int64_t most_sig_bits;
    int64_t least_sig_bits;
}
}

namespace gms {
//an enum example
enum class application_state:int {STATUS = 0,
        LOAD,
        SCHEMA,
        DC};

// example of final class
class versioned_value final {
// getter and setter as public member
    int version;
    sstring value;
}

class gossip_digest {
    inet_address get_endpoint();
    int32_t get_generation();
//mark that a field was added on a specific version
    int32_t get_max_version() [ [version 0.14.2] ];
}

class heart_beat_state {
//getter as function
    int32_t get_generation();
//default value example
    int32_t get_heart_beat_version() [ [version 0.14.2] ] = 1;
}

class endpoint_state {
    heart_beat_state get_heart_beat_state();
    std::map<application_state, versioned_value> get_application_state_map();
}

class gossip_digest_ack {
    std::vector<gossip_digest> digests();
    std::map<inet_address, gms::endpoint_state> get_endpoint_state_map();
}
}

Components

  1. Schema files - used by the code generation to create source and header files.
  2. General serializer.hh header file that includes the definition for all general use method, it also include the auto-generated headers. Components that need to use a serializer, include this file.
  3. serializer_impl.hh include the inline function implementation with includes to the object definitions. The auto-generated source files include this file.

generated files

per file.idl.hh file

  1. file.dist.hh file - contains the forward deceleration of the classes in use and of the function.
  2. file.dist.impl.hh file - contain the serialize and deserialize implementation for each class.

global generated file

  1. serializer.dist.hh file - contains include to all generated .dist.hh files it is included by serializer.hh
  2. serializer.dist.impl.hh - contains include to all the generated .dist.impl.hh files it is included by serializer_impl.hh
Clone this wiki locally