Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a mixin for Flatbuffers #79

Open
timrulebosch opened this issue May 12, 2022 · 4 comments
Open

Implementing a mixin for Flatbuffers #79

timrulebosch opened this issue May 12, 2022 · 4 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@timrulebosch
Copy link

timrulebosch commented May 12, 2022

Is your feature request related to a problem? Please describe.
I'm interested in supporting Flatbuffers via a Mixin. I already have a DataClass based encoder/decoder which uses getattr(...) to call the generated Flatbuffer code as well as loading modules with importlib.import_module(). However, that could be much faster if the code to do the encoding/decoding would be generated once for the schema.

Describe the solution you'd like
So far I can see how to implement the various serialization hooks (pre/post), but what would be the best way to implement the field serialization.

Generally, the code for each hooks needs to; based on the table/field name; load a module, call getattr() to find the right method to call, and then somehow emit the code in a way which can be used by the code builder. Possibly a default_encoder (Encoder)? Essentially, at some point, I need the list of fields, and a way to emit the necessary function calls to encode/decode data.

The pre/post hooks would take care of the "framing" of the Flatbuffer table (i.g. calling Start() and End() as well as creating a buffer at some point).

Describe alternatives you've considered
Currently I use getattr() calls each time a DataClass is serialized. So, I would like to generate the code only once, based on the DataClass, and thus get hopefully a significant performance boost.

Additional context
If its feasible, I don't mind to do implementation of the Mixin.

@BrutalSimplicity
Copy link

BrutalSimplicity commented May 30, 2022

I'm not the author so won't speak of what's possible, but upon review of the code for any of the json, msgpack, or yaml serializers it appears that all of the code building happens upon conversion to a dictionary. There is no code building being applied to serialize/deserialize objects for the formats supported.

I do think this could be done without a code building strategy, by leveraging a cache on the mixin you create that keeps a mapping of field -> method call. From there you could handle both serialization and deserialization by executing the mappings against the flatbuffer field -> method lookup table.

Not all that familiar with flatbuffers, but maybe something like...

[Edit]: Simplified to its essence.

from typing import Any, Mapping, Optional, Type, TypeVar

from mashumaro.mixins.dict import DataClassDictMixin
from mashumaro.serializer.json import DEFAULT_DICT_PARAMS
from typing_extensions import ClassVar, Protocol

T = TypeVar("T", bound="DataClassFlatBufferMixin")

def get_encoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_encoders = {
        'field_name': lambda buffer, **kwargs: bytearray() # method call here
    }
    def encoder(buffer: bytearray, obj: Mapping[str, Any]):
        for key in obj.keys():
            field_encoders[key](buffer)
        return buffer

    return encoder
    
def get_decoder(type: Type[T]):
    # use type and params to lookup module and methods
    field_decoders = {
        'field_name': lambda buffer, **kwargs: 0 # method call here
    }

    def decoder(buffer: bytearray):
        return {key: field_decoder(buffer) for key, field_decoder in field_decoders.items()}

    return decoder

class Decoder(Protocol):
    def __call__(self, buffer: bytearray) -> Mapping[str, Any]: ...

class Encoder(Protocol):
    def __call__(self, buffer: bytearray, obj: Mapping[str, Any]) -> bytearray: ...

class DataClassFlatBufferMixin(DataClassDictMixin):
    __slots__ = ()
    __flatbuffer_encoder: ClassVar[Optional[Encoder]]
    __flatbuffer_decoder: ClassVar[Optional[Decoder]]

    # similar to a metaclass (but simpler)
    # allows setting class variables on any subclass of this type
    def __init_subclass__(cls: Type[T], **kwargs):
        super().__init_subclass__(**kwargs)
        cls.__flatbuffer_encoder = None
        cls.__flatbuffer_decoder = None

    def to_flatbuffer(self: T, buffer: bytearray):
        clazz = type(self)
        if not clazz.__flatbuffer_encoder:
            clazz.__flatbuffer_encoder = get_encoder(type(self))
        return clazz.__flatbuffer_encoder(
            buffer,
            self.to_dict(**dict(DEFAULT_DICT_PARAMS)),
        )

    @classmethod
    def from_flatbuffer(
        cls: Type[T],
        data: bytearray,
    ) -> T:
        if not cls.__flatbuffer_decoder:
            cls.__flatbuffer_decoder = get_decoder(cls)
        return cls.from_dict(
            cls.__flatbuffer_decoder(data),
            **dict(DEFAULT_DICT_PARAMS),
        )

@timrulebosch
Copy link
Author

For reference, what I currently do is something like this:

Flatbuffer Schema:

namespace MyGame.Sample;

table Weapon {
  name:string;
  damage:short;
}

Using generated code (API generated by Flatbuffer compiler):

import flatbuffers
import MyGame.Sample.Weapon

builder = flatbuffers.Builder(1024)

weapon = builder.CreateString('Sword')
MyGame.Sample.Weapon.Start(builder)
MyGame.Sample.Weapon.AddName(builder, weapon)
MyGame.Sample.Weapon.AddDamage(builder, 3)
sword = MyGame.Sample.Weapon.End(builder)

builder.Finish(sword)
buf = builder.Output() // Of type `bytearray`.

And then I have a dataclass defined like this:

@dataclass
class Weapon(FlatbufferTable):
    name: str = None
    damage: int = None
    _fbs_table: type = field(default=MyGame.Sample.Weapon, init=False, repr=False, compare=False)

which is used by my encoder library, which operates based on the dataclass definition and "generates" code:

getattr(self._fbs_table, 'Start')(builder)
object_map['name'] = builder.CreateString('Sword')
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])
getattr(self._fbs_table, 'AddDamage')(builder, 3)
getattr(self._fbs_table, 'End')(builder)

@timrulebosch
Copy link
Author

@BrutalSimplicity thanks for that suggestion. Do you think your idea would work with the "string" case above? For that I would need to call a few functions:

Note that each string in this code is normally generated from the dataclass fields, its hardcoded here for brevity, so the actual code would have a few extra calls.

object_map['name'] = builder.CreateString("Sword")
getattr(self._fbs_table, 'AddName')(builder, object_map['name'])

I think those getattr calls are going to be expensive(?), however, perhaps its possible to emit them as a code object and use it inplace of the lambda as you suggested. Its seems like it would work.

@Fatal1ty
Copy link
Owner

Fatal1ty commented Jun 7, 2022

Hi, guys!

I don't have experience with FlatBuffers, so it'll take me time to dive into this. But we can create subpackage mashumaro.mixins.third_party with the idea that anyone could create a mixin and put it there even if the code quality would have concerns. If someone responsible wants to create mashumaro.mixins.third_party.flatbuffers, I would accept such a pull request without any thoughts :)

@Fatal1ty Fatal1ty added enhancement New feature or request good first issue Good for newcomers and removed feature labels Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants