Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for wrap-decompression middleware #683

Open
KingMob opened this issue Jun 18, 2023 · 9 comments
Open

Add support for wrap-decompression middleware #683

KingMob opened this issue Jun 18, 2023 · 9 comments
Labels
clj-http parity easy Difficulty in resolving the problem good first issue A good issue for newcomers

Comments

@KingMob
Copy link
Collaborator

KingMob commented Jun 18, 2023

clj-http has middleware support for automatically decoding compressed body data via wrap-decompression. As aleph's HTTP client has the explicit goal of clj-http compatibility, we should add this too.

Right now, if you send the header "accept-encoding: gzip, deflate" and get a compressed body back, you have to manually decompress it yourself.

gzip and deflate are the minimum necessary; brotli support should also be added, if available.

@KingMob KingMob added clj-http parity good first issue A good issue for newcomers easy Difficulty in resolving the problem labels Jun 18, 2023
@KingMob
Copy link
Collaborator Author

KingMob commented Aug 31, 2023

HTTP2 may complicate this, as it appears the primary way to decompress is to wrap the codec in the Netty pipeline with a decompression version. Nah, that's solved.

@rajcspsg
Copy link

rajcspsg commented Nov 5, 2023

I'm interested in working on this ticket.

@KingMob
Copy link
Collaborator Author

KingMob commented Nov 6, 2023 via email

@KingMob
Copy link
Collaborator Author

KingMob commented Nov 6, 2023

Preparation

First, I would do your work against the feature/http2-server branch. Once #687 merges, you might have to rearrange some code, and there's a new compression ns in that PR that you might need to use or add to.

HTTP/1

For the H1 client code, what we'll probably want to do is, look for the wrap-decompression symbol in the middleware list, and if present, add an instance of the HttpContentDecompressor handler after the HttpClientCodec in setup-http1-pipeline.

HTTP/2

For the H2 client, I believe AlephHttp2FrameCodecBuilder will already decorate the decompression codec properly. You might not have to do anything extra as long as :compression? is true.

Though on a side note, we probably shouldn't add h2-compression-handler to client pipelines, just servers.

Zstd issues

One catch is, Netty doesn't support Zstd decoding yet. We have several options:

  1. Don't support Zstd at all. It's not too popular yet, and most browsers don't support it (https://caniuse.com/zstd). Should probably parse and strip zstd from the accept-encoding header.
  2. Wait for Add zstd decoder netty/netty#13531 to add it. The PR has been in-progress since August, and it's unclear when it will land.
  3. Update the byte-transforms library to support Zstd and Brotli, and use that to transform the InputStream in Aleph code before handing off the body InputStream to users. This will require extra work for raw mode, since we'd then have to convert back to ByteBufs.
  4. Add Apache commons-codec support. I think this is what clj-http uses. Will require another dep, and probably hsa the same issue as (3) with raw mode.

Might be simplest to do (1), and add full support when (2) is ready.

@rajcspsg
Copy link

@KingMob Thanks for your suggestions.
Now #687 is merged, I looked into the code and couldn't find wrap-decompression middleware. Am I missing anything here?

Also, I didn't get this what do you mean regarding the note -

we probably shouldn't add h2-compression-handler to client pipelines, just servers.

@KingMob
Copy link
Collaborator Author

KingMob commented Dec 22, 2023

@rajcspsg wrap-decompression doesn't exist yet. It's in clj-http, and needs to be added to Aleph. Or to be more explicit, support for it needs to be added, but not all of it will be proper middleware.

Part of this will have to be done with Netty. Compression in Netty isn't done with a pipeline handler, but by injecting codec classes into the Http2FrameCodec handler. So you'll need to look at AlephHttp2FrameCodecBuilder. Some of the decompression code you need is already in place, but I haven't looked at which pieces are missing for H2, if any.

If you call (aleph.http.AlephHttp2FrameCodecBuilder/setCompression builder true some-options-here) it will add the necessary DelegatingDecompressorFrameListener which should already look for the "content-encoding" header and decompress correctly.

So for H2, I think this PR will need to

  1. Look for the wrap-decompression symbol in the user's client middleware list
  2. Call .setCompression(true, ...) on AlephHttp2FrameCodecBuilder
  3. Add a real middleware fn that mimics the update-in of clj-http's wrap-decompression to update the "accept-encoding" header's list of supported codecs.
  4. Potentially remove the "content-encoding" header after decompression, if it's not already done by Netty. (This is to discourage anyone downstream from trying to decompress a second time.) Or better still, remove it and add an "original-content-encoding" header for debugging purposes.

Unlike clj-http, we probably want to add Brotli to the default list. clj-http defaults to adding "gzip, deflate" to whatever's already in the header (if anything), which is not quite correct, but good enough 99.99% of the time. We should leave out zstd and snappy from the defaults, I think. Zstd is promising but still only has partial support. Snappy is niche, and not supported by browsers.

we probably shouldn't add h2-compression-handler to client pipelines, just servers.

This is a little out-of-date. Right now, we don't add it to clients, but the docs and key names used aren't very clear that it's server-only.

The h2-compression-handler is about parsing the client's "accept-encoding" header to determine what encoding to return. In theory, this could be symmetrical, and clients could encode their bodies, but for pragmatic and security reasons, client request body encoding isn't a thing, and we should change the code to make it clearer that it's server-only, and prevent misuse. (E.g., rename the keys, update AlephHttp2FrameCodecBuilder to compress or decompress, but not both simultaneously, etc.) You don't have to do that in this PR, though I think maybe I should update the key names before we do a full release of 0.7.0.

@KingMob
Copy link
Collaborator Author

KingMob commented Dec 22, 2023

@rajcspsg
Copy link

BTW, did you vote on https://clojurians.slack.com/archives/C0G922PCH/p1702723929287379 ?

yes I did :)

@KingMob
Copy link
Collaborator Author

KingMob commented Dec 27, 2023

Hmm, I don't see your name on the vote emojis...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clj-http parity easy Difficulty in resolving the problem good first issue A good issue for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants