Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dictionaries below level 5 #1148

Open
rachel-bousfield opened this issue Mar 26, 2024 · 3 comments
Open

Custom dictionaries below level 5 #1148

rachel-bousfield opened this issue Mar 26, 2024 · 3 comments

Comments

@rachel-bousfield
Copy link

Custom dictionaries can be attached during compression and decompression using C APIs like BrotliEncoderAttachPreparedDictionary. However, it appears they aren't used below brotli level 5. This causes a kind of silent failure where the user doesn't observe the lack of improvement.

There's a few solutions to this issue

  1. The attach and/or prepare methods could document the behavior.
  2. These and BrotliEncoderPrepareDictionary could fail when an incompatible level is applied.
  3. The dictionary format could include the minimum brotli level for compatibility & API-simplification reasons.
  4. Decide this is a bug and implement the feature for lower levels (not backwards compatible)

Here's an example in case the issue isn't clear

brotli -0 -D dictionary.lz -o dict
brotli -0 no-dict
diff dict no-dict    # would use <() but this doesn't work either 
@eustas
Copy link
Collaborator

eustas commented Mar 26, 2024

Thanks for reporting. I'll check and take action when I get spare cycles.
Agree, that at least CLI should let users know if dictionary is ignored.

@rachel-bousfield
Copy link
Author

I'd also look into how the C API could be improved. That's actually how I first discovered this issue. I was rather confused why my dictionary wasn't working :)

@pmeenan
Copy link

pmeenan commented Apr 9, 2024

FWIW, ZStandard allows for dictionary use down to 0 (though, practically you don't see big benefit until 2-3).

I don't know enough about brotli's actual encoding to know if it makes sense, but it could be useful to allow for dictionaries to work at the lower levels as well if that allows for lower CPU use but still decent savings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants