Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with illegal symbols in literal/length and distance huffman encodings #16

Open
pfalcon opened this issue Jul 9, 2018 · 3 comments
Labels

Comments

@pfalcon
Copy link
Owner

pfalcon commented Jul 9, 2018

It turns out that to simplify/make possible dealing with Huffman trees, zlib and uzlib have 288 total Huffman codes for literal/length, and 32 for distance. In both cases, last 2 codes are "illegal" and should not appear in the valid DEFLATE stream.

This ticket is open to consider the best way to deal with them in uzlib, where best way is defined as "requiring the least code overhead, while still returning an error for stream containing, and then the earlier the error reported th better".

@pfalcon
Copy link
Owner Author

pfalcon commented Jul 9, 2018

There're basically 2 strategies:

  1. Make sure these codes aren't present in the Huffman tree, and thus decoding them leads to an error. There's a problem then: a) ensure that Huffman decoding ends at all (with an error) when decoding such codes; b) bother to check result of Huffman decoding.

  2. Don't specially handle these codes in the Huffman tree, let them be decoded as normal, and then do bound checking on decoded symbol values. This assumes that otherwise Huffman decoding can't fail, and we can decode any code to symbol, so can trade check of decoding status for bounds check on decoded symbol. But it's unclear if this assumption is correct.

@pfalcon
Copy link
Owner Author

pfalcon commented Jul 10, 2018

286 vs 288, 30 vs 32 dichotomy is also mentioned in my old StackOverflow question, https://stackoverflow.com/questions/25431160/what-is-the-maximum-size-of-encoded-dynamic-huffman-tree-as-used-by-deflate-zli/25462620, which has useful further references on the matter.

@pfalcon
Copy link
Owner Author

pfalcon commented Sep 10, 2018

(This comment was originally posted on the wrong ticket, on 2018-07-09.)

In #10 (comment), I decide to go for p.2 for now, simply because it's more obvious how to patch in boundary check, and that satisfies AFL crash corpus - than to figure out how to patch Huffman tree correctly (well, to prove that intuitive patchings are formally correct). But again, it's unclear if that's good/enough.

@pfalcon pfalcon added the triaged label May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant