Add message streaming support #518

JoshuaLeivers · 2023-08-16T12:00:37Z

Adds support for serializing/deserializing messages and their components to/from streams. Where possible, existing methods now use this functionality internally, minimising code size.

This is useful as a standalone feature, and is also a step towards the functionality requested in #482

As part of this, the following functions have been created/changed:

betterproto.dump_varint - this function encodes a value as a varint and dumps it into the provided stream. It is mostly the same as the existing betterproto.encode_varint was, but based around streams and with some additional error checking.
betterproto.encode_varint - this existing function has the same effects, but now uses betterproto.dump_varint internally, keeping code size and complexity down.
betterproto.size_varint - this function calculates the size of the varint for a given value without actually serializing it. This may be useful to some, and similar functions exist in the official C++ and other implementations. It is also used internally by other new functionality to reduce memory and time usage compared to simply running len(...) after serializing a varint.
betterproto._len_preprocessed_single - calculates the size of the value that would be returned by _preprocess_single without fully serializing it. Used internally by new functionality to reduce memory and time usage over simply serializing and then checking the size of it.
betterproto._len_single - similar to above, but for _serialize_single.
betterproto.load_varint - loads a varint from a stream and decodes its value. Mostly the same as decode_varint already was, but based around streams.
betterproto.decode_varint - existing function, has the same functionality as it did previously, but now uses load_varint internally to keep code size and complexity down.
betterproto.load_fields - does the same as parse_fields, but loads the fields from the provided stream, rather than a bytes object. Used internally by Message.load.
betterproto.Message.dump - does the same as Message.__bytes__ already did, but dumps the results to a stream rather than a bytes object.
betterproto.Message.__bytes__ - does as it already did, but now uses Message.dump internally to reduce code size and complexity.
betterproto.Message.__len__ - returns the size of the encoded message - i.e. does the same as len(bytes(message)) without fully serializing the message, reducing time and memory usage.
betterproto.Message.load - loads and parses a binary encoded message from a stream. Similar to Message.parse, but retrieving the data from a stream rather than a bytes object.
betterproto.Message.parse - does as it already did, but now uses Message.load internally to reduce code size and complexity.

Adds support for serializing/deserializing messages and their components to/from streams. Where possible, existing methods now use this functionality internally, minimising code size. This is useful as a standalone feature, and is also a step towards the functionality requested in danielgtaylor#482 As part of this, the following functions have been created/changed: - `betterproto.dump_varint` - this function encodes a value as a varint and dumps it into the provided stream. It is mostly the same as the existing `betterproto.encode_varint` was, but based around streams and with some additional error checking. - `betterproto.encode_varint` - this existing function has the same effects, but now uses `betterproto.dump_varint` internally, keeping code size and complexity down. - `betterproto.size_varint` - this function calculates the size of the varint for a given value without actually serializing it. This may be useful to some, and similar functions exist in the official C++ and other implementations. It is also used internally by other new functionality to reduce memory and time usage compared to simply running `len(...)` after serializing a varint. - `betterproto._len_preprocessed_single` - calculates the size of the value that would be returned by `_preprocess_single` without fully serializing it. Used internally by new functionality to reduce memory and time usage over simply serializing and then checking the size of it. - `betterproto._len_single` - similar to above, but for `_serialize_single`. - `betterproto.load_varint` - loads a varint from a stream and decodes its value. Mostly the same as `decode_varint` already was, but based around streams. - `betterproto.decode_varint` - existing function, has the same functionality as it did previously, but now uses `load_varint` internally to keep code size and complexity down. - `betterproto.load_fields` - does the same as `parse_fields`, but loads the fields from the provided stream, rather than a `bytes` object. Used internally by `Message.load`. - `betterproto.Message.dump` - does the same as `Message.__bytes__` already did, but dumps the results to a stream rather than a `bytes` object. - `betterproto.Message.__bytes__` - does as it already did, but now uses `Message.dump` internally to reduce code size and complexity. - `betterproto.Message.__len__` - returns the size of the encoded message - i.e. does the same as `len(bytes(message))` without fully serializing the message, reducing time and memory usage. - `betterproto.Message.load` - loads and parses a binary encoded message from a stream. Similar to `Message.parse`, but retrieving the data from a stream rather than a `bytes` object. - `betterproto.Message.parse` - does as it already did, but now uses `Message.load` internally to reduce code size and complexity.

Also adds return type hint.

src/betterproto/__init__.py

This should improve performance while not significantly impacting readability. Co-authored-by: James Hilton-Balfe <gobot1234yt@gmail.com>

Co-authored-by: James Hilton-Balfe <gobot1234yt@gmail.com>

src/betterproto/__init__.py

The change from using Generator to using Iterator for return type hints was in error due to a misreading of the Python docs.

Co-authored-by: James Hilton-Balfe <gobot1234yt@gmail.com>

`Message.parse(...)` will now accept a `_typeshed.ReadableBuffer` rather than only a `bytes` object.

Gobot1234 · 2023-08-22T13:56:41Z

Thanks for all the work on this

JoshuaLeivers · 2023-08-29T13:24:26Z

Hi, was there anything left over on this PR for me to do? Just wanted to check up on how close it is to being merged, or if there's anything preventing it 🙂

JoshuaLeivers added 4 commits August 16, 2023 12:56

Replace repeated len calls with variable

361993c

Add test for cut-off varint handling

8be37d2

Add docstring to Message.dump

9c6daba

Also adds return type hint.

Gobot1234 reviewed Aug 18, 2023

View reviewed changes