Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HTTP compression #520

Open
JustAnotherArchivist opened this issue Sep 28, 2021 · 0 comments
Open

Enable HTTP compression #520

JustAnotherArchivist opened this issue Sep 28, 2021 · 0 comments

Comments

@JustAnotherArchivist
Copy link
Contributor

AB currently doesn't make use of wpull's --http-compression option, so it doesn't send an Accept-Encoding header. Occasionally, there are websites which hate that. For example, https://www.cresta-awards.com/ sends an empty response body when compression isn't enabled, and https://www-ssrl.slac.stanford.edu/~swebb/ simply kills the connection.

Since browsers seem to send Accept-Encoding: gzip, deflate (or possibly brotli too these days) on all requests, it should probably be safe to enable this globally. It might cause a very small increase in WARC size because web servers are unlikely to always compress data at the highest compression level (as wpull does for writing WARCs), and working with compressed data inside compressed WARCs is slightly annoying, but those are just minor, irrelevant downsides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant