Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict use of Content-Encoding for upload append #2674

Open
Acconut opened this issue Nov 7, 2023 · 4 comments
Open

Restrict use of Content-Encoding for upload append #2674

Acconut opened this issue Nov 7, 2023 · 4 comments

Comments

@Acconut
Copy link
Member

Acconut commented Nov 7, 2023

The initial POST request to create the upload resource might contain Content-Encoding, e.g. Content-Encoding: gzip. If the upload gets interrupted, it might be resumed using a HEAD and PATCH request. This subsequent PATCH request should then not contain a conflicting Content-Encoding header field, because the server would not be able to know what Content-Encoding has to be used once the upload is complete.

Maybe we should disallow including Content-Encoding at all when appending to an upload. Because the upload might have been interrupted at an arbitrary position, neither the partial content received before the interruption nor the partial content transferred after the interruption are guaranteed to be valid gzip data, for example.

A similar argument could also be made to restrict Content-Language in these PATCH requests.

@LPardue
Copy link
Contributor

LPardue commented Nov 7, 2023

I don't follow.

The content-encoding is tied to the representation. So an upload using gzip that gets interrupted halfway through, should be able to be patched with the last half of the gzip representation.

Maybe the concern is

  1. Uploading with one encoding and resuming with another
  2. A server converting the encoding "on the fly" as it receives bytes I.e. gunzipping and writing bytes to storage
  3. That the resumed bytes are encoded differently I.e. gzip only half the last bytes in a way that is not compatible and can't be cat'tted to the first half

?

Each of those seems like things we should accomodate and explain but not prohibit. HTTP representations are our friend here.

@Acconut
Copy link
Member Author

Acconut commented Nov 9, 2023

Maybe the concern is

1. Uploading with one encoding and resuming with another
  1. That the resumed bytes are encoded differently I.e. gzip only half the last bytes in a way that is not compatible and can't be cat'tted to the first half

Yes, those two are the concerns behind this issue. I am not sure if there are use cases where the client would like to change the encoding when resuming the upload.

So an upload using gzip that gets interrupted halfway through, should be able to be patched with the last half of the gzip representation.

Correct, that behavior should be the norm. Resuming without gzip (or resuming with another compression) would be problematic because the parts cannot be concatenated properly.

I hope that helps to clear up this topic. I will dig more into the HTTP representations to see how we can accommodate this better in the draft.

@MikeBishop
Copy link
Contributor

Even though real-world servers do perform compression on-the-fly, as a matter of HTTP, Content-Encoding is a property of the representation. If something is uploaded with Content-Encoding: gzip, that indicates the entire body, once collected, is a valid gzipped resource.

If a client wanted to gzip the bytes being appended using compression local only to that message, that would be a Transfer-Encoding. However, T-Es have been removed from HTTP/2 and later, so they're not likely to be useful here.

More generally, I think what we want to say is that:

  • The initial request to the main resource contains fields which describe the entire resource which the client proposes to upload (length, content-encoding, media type, etc.)
  • Subsequent requests, which are made to the upload resource, contains fields which describe the chunk -- whatever media type we decide in Media type for Upload Appending Procedure #2610, Content-Encoding: identity, etc. -- while noting that this chunk will be a fragment of the resource described in the original request
  • When the upload is complete, the server takes the completed upload body and interprets it as if it had been the payload of the initial request, including all the fields the client originally included. This implies the server will need to preserve these, not just the upload itself.

@Acconut
Copy link
Member Author

Acconut commented Nov 23, 2023

More generally, I think what we want to say is that:

I agree with all of these points, thanks for writing this up. I will try to open a PR, so this is clarified in the PR.

Subsequent requests, which are made to the upload resource, contains fields which describe the chunk [...] Content-Encoding: identity

If an upload resource is created with Content-Encoding: gzip, for example, I am wondering if the subsequent PATCH requests should also include Content-Encoding: gzip to signal that they modify the resource with the same representation as the initial request.

Did you have another intention behind using Content-Encoding: identity? RFC 9110 does not seem to advise its use (https://httpwg.org/specs/rfc9110.html#field.content-encoding):

Note that the coding named "identity" is reserved for its special role in Accept-Encoding and thus SHOULD NOT be included [in Content-Encoding].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants