Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add big int format to msgpack.pack #7811

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Neradoc
Copy link

@Neradoc Neradoc commented Mar 29, 2023

This adds big ints to msgpack.pack, which can now encode 64-bit signed ints.
And now what we can encode and decode with msgpack should be on parity with C python.

Of note:

  • Positive ints are always encoded as unsigned.
  • 32 bit ints that don't fit in a small int are encoded as 64-bit.

Differentiating big ints that could fit in 32 bits would require a bunch more code, and I don't even know how I would do that, so it's an optimization of encoding size for another day.

Fixes #6851
Fixes #7388

Some test code that encodes and decodes back to compare the result, catch errors, and also compare to the expected encoding based on C python (doesn't always match but that's super important IMO as long as it decodes the same).
Note that the OverflowError are marked as ✅ when expected.

import msgpack
from io import BytesIO
import traceback
import binascii

all_tests = []

import time
time.sleep(4)

def encode(data, val, should_error=False):
    if isinstance(data, str):
        data = binascii.unhexlify(data)
    try:
        out = None
        back = None
        printval = repr(val)
        if isinstance(val, int):
            printval = f"{val:X}h"
        elif isinstance(val, float):
            printval = f"{val:f}"

        buffer = BytesIO()
        msgpack.pack(val, buffer)
        out = buffer.getvalue()

        printout = binascii.hexlify(out).decode()
        printdata = binascii.hexlify(data).decode()

        if out == data:
            icon = "✅"
            print(f"{icon}   {printval}")
        else:
            try:
                buffer.seek(0)
                back = msgpack.unpack(buffer)
            except Exception:
                pass
            if back == val:
                icon = "✅"
                print(f"{icon}   {printval} 🎭 #{printout}")
            else:
                icon = "🚫"
                print(f"{icon}   #{printout} =?= #{printdata}")
    except NotImplementedError as ex:
        # traceback.print_exception(ex)
        print(f"⚠️    Not Implemented {printval}")
    except ValueError as ex:
        # traceback.print_exception(ex)
        print(f"⚠️    No Packer {printval}")
    except OverflowError as ex:
        if should_error:
            icon = "✅"
        else:
            icon = "⚠️"
        print(f"{icon}   Overflow {printval}")
    all_tests.append((val, out))

# int
encode(b'\x05', 5)
encode(b'\xcc\xFF', 0xFF)
encode(b'\xd0\x91', -0x6F)
encode(b'\xd1\xF0\x91', -0xF6F)
encode(b'\xce\x3f\xff\xff\xff', 0x3fffffff)
encode(b'\xd2\xd0\x00\x00\x91', -0x2FFFFF6F) # below 32 bits
# int64
encode(b'\xcf\x11"3DUfw\x88', 0x1122334455667788)
encode(b'\xd1\xff\x00', -256)
# uint
encode(b'\xcc\x91', 0x91)
encode(b'\xcd\xF0\x91', 0xF091)
encode(b'\xce\x3f\xff\xff\xff', 0x3fffffff)
encode(b'\xce\x80\x00\x00\x91', 0x80000091) # doesn't fit in small int
# uint64
encode(b'\xce\x11\x22\x33\x44', 0x11223344)
encode(b'\xcf\xff\xff\xff\xff\xff\xff\xff\xff', 0xffffffffffffffff)
encode(b'\xd3\x80\x00\x00\x00\x00\x00\x00\x00', -0x8000000000000000)
encode(b"", 0x10000000000000000, should_error=True)
encode(b"", -0x8000000000000001, should_error=True)
# float
encode('ca400921fb', 2.1426990032196045)
# double
encode('cbc09208762b6ae7d5', -1154.1154)

# in arrays
encode(
    b'\x94\xd1\xF0\x91'
    b'\xcf\xff\xff\xff\xff\xff\xff\xff\xff'
    b'\xcb\xc0\x92\x08v+j\xe7\xd5\x02',
    [-3951, 0xffffffffffffffff, -1154.1154, 2]
)

# Use this to print a test for C python decoding
# floats won't have the same precision, so you can't rely on the == test
PRINT_DESKTOP_TEST = False
if PRINT_DESKTOP_TEST:
    print(f"""
all_tests = {repr(all_tests)}
for val,out in all_tests:
    if out:
        decoded = msgpack.unpackb(out)
        print(f"{{val == decoded:<2}}", val)
        print("  ", decoded)
""")

Before:

code.py output:
✅   5h
✅   FFh 🎭 #d100ff
✅   -6Fh
✅   -F6Fh
✅   3FFFFFFFh 🎭 #d23fffffff
✅   -2FFFFF6Fh
⚠️    No Packer 1122334455667788h
✅   -100h
✅   91h 🎭 #d10091
✅   F091h 🎭 #d20000f091
✅   3FFFFFFFh 🎭 #d23fffffff
⚠️    No Packer 80000091h
✅   11223344h 🎭 #d211223344
⚠️    No Packer FFFFFFFFFFFFFFFFh
⚠️    No Packer -8000000000000000h
⚠️    No Packer 10000000000000000h
⚠️    No Packer -8000000000000001h
✅   2.142700 🎭 #ca40092200
✅   -1154.115234 🎭 #cac49043b0
⚠️    No Packer [-3951, 18446744073709551615, -1154.12, 2]

After:

code.py output:
✅   5h
✅   FFh
✅   -6Fh
✅   -F6Fh
✅   3FFFFFFFh
✅   -2FFFFF6Fh
✅   1122334455667788h
✅   -100h
✅   91h
✅   F091h
✅   3FFFFFFFh
✅   80000091h 🎭 #cf0000000080000091
✅   11223344h
✅   FFFFFFFFFFFFFFFFh
✅   -8000000000000000h
✅   Overflow 10000000000000000h
✅   Overflow -8000000000000001h
✅   2.142700 🎭 #ca40092200
✅   -1154.115234 🎭 #cac49043b0
✅   [-3951, 18446744073709551615, -1154.12, 2] 🎭 #94d1f091cfffffffffffffffffcac49043b002

Python 3:

❯ python /Volumes/PROMICRO/code.py 
✅   5h
✅   FFh
✅   -6Fh
✅   -F6Fh
✅   3FFFFFFFh
✅   -2FFFFF6Fh
✅   1122334455667788h
✅   -100h
✅   91h
✅   F091h
✅   3FFFFFFFh
✅   80000091h
✅   11223344h
✅   FFFFFFFFFFFFFFFFh
✅   -8000000000000000h
✅   Overflow 10000000000000000h
✅   Overflow -8000000000000001h
✅   2.142699 🎭 #cb4001243f60000000
✅   -1154.115400
✅   [-3951, 18446744073709551615, -1154.1154, 2]

} else if ((int16_t)x == x) {
write1(s, 0xd1);
write2(s, x);
STATIC void pack_int(msgpack_stream_t *s, mp_obj_t obj, bool _signed) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity I would split this into 2 functions for the two cases:

STATIC void pack_small_int(msgpack_stream_t *s, mp_int_t value)
STATIC void pack_int(msgpack_stream_t *s, mp_obj_t obj)

@@ -195,19 +196,34 @@ STATIC mp_map_elem_t *dict_iter_next(mp_obj_dict_t *dict, size_t *cur) {
return NULL;
}

STATIC void pack_int(msgpack_stream_t *s, int32_t x) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function works as is for pack_small_int. I wouldn't bother trying to optimize for 1 byte of packing for some int values.

write1(s, 0xd1);
write2(s, x);
STATIC void pack_int(msgpack_stream_t *s, mp_obj_t obj, bool _signed) {
byte buffer[9];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is already setup with write functions to write its output. It seems unnecessary to add another layer of indirection (buffer) to write the output.

@dhalbert
Copy link
Collaborator

@Neradoc This is still in progress, is that right?

@Neradoc
Copy link
Author

Neradoc commented Apr 11, 2023

Yes I need to get back to it.

@jepler jepler marked this pull request as draft August 3, 2023 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

msgpack doesn't support full range of int32 msgpack not choosing type correctly
3 participants