Skip to content

Application Layer Protocol Definition / Binary Serialization Toolset

License

Notifications You must be signed in to change notification settings

hyper-level-nerds/sentient

Repository files navigation


logo

Sentient


Application Layer Protocol Definition / Binary Serialization Toolset
Report Bug· Request Feature

Table of Contents
  1. About The Sentient Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

This project is still in a draft

About The Sentient Project

Concepts

This toolset is being written for those who wants to transmit models fast with both famous ready-made and custom application layer protocols. Application layer protocols you defined without the Sentient toolset would be annoying since you need to implement in various programming languages. however, with the Sentient library, you just need to compile the Sentient schema language source code and use it immediately. In addition, models for transmission could be serialized/deserialized easily with the Sentient library features even if the programming language has no reflection syntax

This project is being researched to clarify the concepts and specific features. Some features that are clearly defined are

  • Application layer protocol definition, model for transport definition
  • Schema language
  • Model/Protocol source code generation of various programming languages using schema compiler

Brief Specifications

In order to implement the Sentient library in a specific programming language, some common features must be implemented below

Basic Types

In the implementation, the library should have 8/16/32/64 bit integer, 32/64 bit floating point types
In some languages, they have the types as primitive types and it would be much better to have aliased to be the Sentient names(u8, u16, u32, u64, i8, i16, i32, i64, f32, f64) if they have the type name aliasing syntax
Also, In several languages having only simple number types(Python, TypeScript...), They should have all types u8 through f64 as wrapper classes. Whenever possible, operator overloading functions (if the syntax exists) should satisfy the user experience

Type Description
u8 Unsigned 8 bit integer type
u16 Unsigned 16 bit integer type
u32 Unsigned 32 bit integer type
u64 Unsigned 64 bit integer type
i8 Signed 8 bit integer type
i16 Signed 16 bit integer type
i32 Signed 32 bit integer type
i64 Signed 64 bit integer type
f32 32 bit floating point type
f64 64 bit floating point type

Time Types

There are a lot of way to transmit the time information in various programming languages and in many user space protocols not in text stream base, As far as I know, some type are mostly used as below
In the implementation of the Sentient library of each programming language, all time info types should be convertible to the standard library time info types of each language
In addition, the time info types of Sentient do not necessarily have to be implemented using bit fields, and the fields can be compressed to the corresponding bit size when serialized to binary

  • t64 / PosixTime
    • unsigned 64 bit integer contains UNIX timestamp(in seconds)
  • t128 / TimeSpec
    • unsigned 64 bit contains UNIX timestamp with unsigned 64 bit integer contains nanoseconds
    • Fields/Descriptions
    Field Signed ? / Size in bits Value Range Description
    seconds unsigned / 64 0~ UNIX Timestamp
    nanoseconds unsigned / 64 0~999,999,999 nanoseconds
  • cg32 / CompactGregorianCalendar
    • There are several traditional ways to transmit time information in the Gregorian calendar format
      The most used way to transmit it is to put the year, month, day, hour, minute, and second values ​​excluding century into a 32-bit structure
    • Fields/Descriptions
    Field Signed ? / Size in bits Value Range Description
    year unsigned / 7 0~99 year not containing century information
    month unsigned / 4 1~12 month
    day unsigned / 5 1~31 day
    hours unsigned / 5 0~23 hours
    minutes unsigned / 6 0~59 minutes
    seconds unsigned / 5 0~29 seconds increment by 2
  • cg64 / PrecisionCompactGregorianCalendar
    • Lengthened the seconds field to 6 bits in order to contain complete 0~59 second values and added an unsigned 64 bit integer field contains nanoseconds at the end of the cg32/CompactGregorianCalendar type
    • Fields/Descriptions
    Field Signed ? / Size in bits Value Range Description
    year unsigned / 7 0~99 year not containing century information
    month unsigned / 4 1~12 month
    day unsigned / 5 1~31 day
    hours unsigned / 5 0~23 hours
    minutes unsigned / 6 0~59 minutes
    seconds unsigned / 6 0~59 seconds increment by 2
    nanoseconds unsigned / 64 0~999,999,999 nanoseconds
  • g64 / GregorianCalendar
    • If you guys wanna include century information in cg32 type, you can use this type
    • Fields/Descriptions
    Field Signed ? / Size in bits Value Range Description
    year signed / 38 -137,438,953,472~137,438,953,471 BC~AD year
    month unsigned / 4 1~12 month
    day unsigned / 5 1~31 day
    hours unsigned / 5 0~23 hours
    minutes unsigned / 6 0~59 minutes
    seconds unsigned / 6 0~59 seconds
  • g128 / PrecisionGregorianCalendar
    • Nanoseconds with g64/GregorianCalendar!
    Field Signed ? / Size in bits Value Range Description
    year signed / 38 -137,438,953,472~137,438,953,471 BC~AD year
    month unsigned / 4 1~12 month
    day unsigned / 5 1~31 day
    hours unsigned / 5 0~23 hours
    minutes unsigned / 6 0~59 minutes
    seconds unsigned / 6 0~59 seconds
    nanoseconds unsigned / 64 0~999,999,999 nanoseconds

Data Containers

Variable-Sized Array

The reason why I started this project, serializing/deserializing variable sized data containers was a huge pain for me
However, it is also painful to modify objects in service logic without using the highly abstracted standard data containers provided by each programming language
In binary serialization, variable-size arrays are usually divided into a size field and a data field, and the number of bytes in the size field must first be defined in the payload definition
Let's say there is a payload definition has variable-sized array at the end of its fields, and the size field is 16 bit unsigned and each element field is 32 bit signed

// example scenario in C language

struct example_variable_sized_array {
    uint8_t meaningless_field;
    uint16_t array_size;
    uint32_t* array;
};

const size_t size = 3;
char serialized_buffer[256] = { 0, };
struct example_variable_sized_array obj = { 0, };
obj.meaningless_field = 1;
obj.array_size = (uint16_t)size;
obj.array = calloc(size, sizeof(struct example_variable_sized_array));
obj.array[0] = 1;
obj.array[1] = 2;
obj.array[2] = 3;

*(uint8_t*)(&serialized_buffer[0]) = obj.meaningless_field;
*(uint16_t*)(&serialized_buffer[1]) = obj.array_size;
*(uint32_t*)(&serialized_buffer[3]) = obj.array[0];
*(uint32_t*)(&serialized_buffer[7]) = obj.array[1];
*(uint32_t*)(&serialized_buffer[11]) = obj.array[2];

//
// Some communication jobs...
//

free(obj.array);
obj.array = NULL;

The shape of the example variable-sized array in binary serialization (little-endian)

meaningless field size size elem0 elem0 elem0 elem0 elem1 elem1 elem1 elem1 elem2 elem2 elem2 elem2
1 3 0 1 0 0 0 2 0 0 0 3 0 0 0