Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression does not respect LZ4 official End of block conditions #12

Open
rlespinet opened this issue Feb 16, 2023 · 0 comments · May be fixed by #13
Open

Compression does not respect LZ4 official End of block conditions #12

rlespinet opened this issue Feb 16, 2023 · 0 comments · May be fixed by #13

Comments

@rlespinet
Copy link

I noticed that the library does not respect end of block conditions specified in the official LZ4 repository. More specifically

End of block conditions

  1. The last match must start at least 12 bytes before the end of block. The last match is part of the penultimate sequence. It is followed by the last sequence, which contains only literals.

For example the following ASCII string

Abcdefghijklmnop0000000000000000Abcdefghijk

is encoded as

04 22 4d 18 40 70 df 1e 00 00 00 fb 02 41 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 30 01 00 02 20 00 50 67 68 69 6a 6b 00 00 00 00
<━ ━ ━  FRAME ━ ━ ━> <━ BLOCK ━> <━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ SEQUENCE 0 ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━> <SEQ  1> <━  SEQUENCE 2 ━> <━ FRAME ━>
                                        A  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  0    |         |     g  h  i  j  k
                                                                                        ▲    |         |
                                        ▲              ▲                                ┕━━━━┙         |
                                        ┕━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

This produces a match starting less than 12 bytes before the end of the block, which is not guaranteed to be decoded correctly.
In contrast, LZ4 official encoder correctly prevents the match from happening: here is what is generated for the same input

04 22 4D 18 60 40 82 21 00 00 00 FB 02 41 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 30 01 00 B0 41 62 63 64 65 66 67 68 69 6A 6B 00 00 00 00
<━ ━ ━  FRAME ━ ━ ━> <━ BLOCK ━> <━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ SEQUENCE 0 ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━ ━> <━ ━ ━ ━ ━ ━ SEQUENCE 1  ━ ━ ━ ━ ━> <━ FRAME ━>
                                        A  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  0    |       A  b  c  d  e  f  g  h  i  j  k
                                                                                        ▲    |
                                                                                        ┕━━━━┙

This was obtained with the following command

$ echo -ne "Abcdefghijklmnop0000000000000000Abcdefghijk" | lz4 -c -12 --no-frame-crc | od -t x1 -A n
 04 22 4d 18 60 40 82 21 00 00 00 fb 02 41 62 63
 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 30 01 00
 b0 41 62 63 64 65 66 67 68 69 6a 6b 00 00 00 00

Using

$ lz4 --version
*** LZ4 command line interface 64-bits v1.9.2, by Yann Collet ***

Note that adding an extra character (from Abcdefghijklmnop0000000000000000Abcdefghijk to Abcdefghijklmnop0000000000000000Abcdefghijkl) the match is now starting 12 bytes before the end of block and producing a match is now legal (therefore LZ4 official produces the same output as lz4js)

echo -ne "Abcdefghijklmnop0000000000000000Abcdefghijkl" | lz4 -c -12 --n o-frame-crc | od -t x1 -A n
 04 22 4d 18 60 40 82 1e 00 00 00 fb 02 41 62 63
 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 30 01 00
 03 20 00 50 68 69 6a 6b 6c 00 00 00 00
@rlespinet rlespinet linked a pull request Feb 16, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant