You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a discrepancy between the checksum supplied in the serialized Presto Slice returned by binaryData (introduced in #20932). According to the documentation (https://prestodb.io/docs/current/develop/serialized-page.html), bytes 13..21 (exclusive end index) should be a little-endian encoded int64 that encodes the checksum, calculated from after the header (ie, starting byte offset 21) to the end.
The declared checksum (1893016217) does not match the calculated checksum, or any variant I can find. I've checked the checksums returned by this library with an online tool (https://crccalc.com/). The library is also widely used and thus unlikely to have a significant bug. I've checked with the responses to several different queries.
There's an immediate ambiguity: The stated size (after the header) of the slice (46) does not always equal the size of the slice; in particular for small slices it seems to be padded to 1000 bytes with 0s. I am assuming the checksum is only applied to the size bytes after the header, where size is the payload size from bytes 9..13 (little-endian int32), but it does not equal this, nor when applied to bytes 21..1000 or 0..1000 or 0..(21+46).
Note that all the examples I've come across also parse fine (up to the issue in #22601 ), so I do not think the data is corrupted.
This issue means I cannot verify that the supplied binary data is correct, which could lead to panics in production code and/or silent data corruption.
The text was updated successfully, but these errors were encountered:
This bug report is incorrect. The documentation states that the checksum is not just for the bytes after the header, but also has contributions from the codec, the number of rows, and the uncompressed size. Using these (and truncating the bytes after the header by the size header field) gives me the correct checksum.
There is a discrepancy between the checksum supplied in the serialized Presto Slice returned by binaryData (introduced in #20932). According to the documentation (https://prestodb.io/docs/current/develop/serialized-page.html), bytes 13..21 (exclusive end index) should be a little-endian encoded int64 that encodes the checksum, calculated from after the header (ie, starting byte offset 21) to the end.
The declared checksum (1893016217) does not match the calculated checksum, or any variant I can find. I've checked the checksums returned by this library with an online tool (https://crccalc.com/). The library is also widely used and thus unlikely to have a significant bug. I've checked with the responses to several different queries.
There's an immediate ambiguity: The stated size (after the header) of the slice (46) does not always equal the size of the slice; in particular for small slices it seems to be padded to 1000 bytes with 0s. I am assuming the checksum is only applied to the size bytes after the header, where size is the payload size from bytes 9..13 (little-endian int32), but it does not equal this, nor when applied to bytes 21..1000 or 0..1000 or 0..(21+46).
Note that all the examples I've come across also parse fine (up to the issue in #22601 ), so I do not think the data is corrupted.
cc @arhimondr @mbasmanova
Your Environment
Expected Behavior
The provided checksum (int64 little-endian in bytes 13..21) should match the crc32 checksum of the
size
bytes after the header21..(21 + size)
.Current Behavior
The provided checksum does not match any checksum calculation I can find.
Steps to Reproduce
Gist https://gist.github.com/jagill/666925f2833723ca567e1aa487500872 shows how to reproduce it, plus all the various checksums calculated above.
Context
This issue means I cannot verify that the supplied binary data is correct, which could lead to panics in production code and/or silent data corruption.
The text was updated successfully, but these errors were encountered: