Restrict use of Content-Encoding for upload append #2674

Acconut · 2023-11-07T12:46:01Z

The initial POST request to create the upload resource might contain Content-Encoding, e.g. Content-Encoding: gzip. If the upload gets interrupted, it might be resumed using a HEAD and PATCH request. This subsequent PATCH request should then not contain a conflicting Content-Encoding header field, because the server would not be able to know what Content-Encoding has to be used once the upload is complete.

Maybe we should disallow including Content-Encoding at all when appending to an upload. Because the upload might have been interrupted at an arbitrary position, neither the partial content received before the interruption nor the partial content transferred after the interruption are guaranteed to be valid gzip data, for example.

A similar argument could also be made to restrict Content-Language in these PATCH requests.

The text was updated successfully, but these errors were encountered:

LPardue · 2023-11-07T13:38:40Z

I don't follow.

The content-encoding is tied to the representation. So an upload using gzip that gets interrupted halfway through, should be able to be patched with the last half of the gzip representation.

Maybe the concern is

Uploading with one encoding and resuming with another
A server converting the encoding "on the fly" as it receives bytes I.e. gunzipping and writing bytes to storage
That the resumed bytes are encoded differently I.e. gzip only half the last bytes in a way that is not compatible and can't be cat'tted to the first half

?

Each of those seems like things we should accomodate and explain but not prohibit. HTTP representations are our friend here.

Acconut · 2023-11-09T10:26:08Z

Maybe the concern is
1. Uploading with one encoding and resuming with another
That the resumed bytes are encoded differently I.e. gzip only half the last bytes in a way that is not compatible and can't be cat'tted to the first half

Yes, those two are the concerns behind this issue. I am not sure if there are use cases where the client would like to change the encoding when resuming the upload.

So an upload using gzip that gets interrupted halfway through, should be able to be patched with the last half of the gzip representation.

Correct, that behavior should be the norm. Resuming without gzip (or resuming with another compression) would be problematic because the parts cannot be concatenated properly.

I hope that helps to clear up this topic. I will dig more into the HTTP representations to see how we can accommodate this better in the draft.

MikeBishop · 2023-11-13T15:51:31Z

Even though real-world servers do perform compression on-the-fly, as a matter of HTTP, Content-Encoding is a property of the representation. If something is uploaded with Content-Encoding: gzip, that indicates the entire body, once collected, is a valid gzipped resource.

If a client wanted to gzip the bytes being appended using compression local only to that message, that would be a Transfer-Encoding. However, T-Es have been removed from HTTP/2 and later, so they're not likely to be useful here.

More generally, I think what we want to say is that:

The initial request to the main resource contains fields which describe the entire resource which the client proposes to upload (length, content-encoding, media type, etc.)
Subsequent requests, which are made to the upload resource, contains fields which describe the chunk -- whatever media type we decide in Media type for Upload Appending Procedure #2610, Content-Encoding: identity, etc. -- while noting that this chunk will be a fragment of the resource described in the original request
When the upload is complete, the server takes the completed upload body and interprets it as if it had been the payload of the initial request, including all the fields the client originally included. This implies the server will need to preserve these, not just the upload itself.

Acconut · 2023-11-23T16:37:42Z

More generally, I think what we want to say is that:

I agree with all of these points, thanks for writing this up. I will try to open a PR, so this is clarified in the PR.

Subsequent requests, which are made to the upload resource, contains fields which describe the chunk [...] Content-Encoding: identity

If an upload resource is created with Content-Encoding: gzip, for example, I am wondering if the subsequent PATCH requests should also include Content-Encoding: gzip to signal that they modify the resource with the same representation as the initial request.

Did you have another intention behind using Content-Encoding: identity? RFC 9110 does not seem to advise its use (https://httpwg.org/specs/rfc9110.html#field.content-encoding):

Note that the coding named "identity" is reserved for its special role in Accept-Encoding and thus SHOULD NOT be included [in Content-Encoding].

Acconut added the resumable-upload label Nov 7, 2023

Acconut mentioned this issue Mar 4, 2024

Resumable Upload: Add details about Transfer-Encoding and Content-Encoding #2746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict use of Content-Encoding for upload append #2674

Restrict use of Content-Encoding for upload append #2674

Acconut commented Nov 7, 2023

LPardue commented Nov 7, 2023

Acconut commented Nov 9, 2023 •

edited

MikeBishop commented Nov 13, 2023

Acconut commented Nov 23, 2023

Restrict use of Content-Encoding for upload append #2674

Restrict use of Content-Encoding for upload append #2674

Comments

Acconut commented Nov 7, 2023

LPardue commented Nov 7, 2023

Acconut commented Nov 9, 2023 • edited

MikeBishop commented Nov 13, 2023

Acconut commented Nov 23, 2023

Acconut commented Nov 9, 2023 •

edited