Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving body of an interrupted PATCH request #89

Open
janko opened this issue Aug 19, 2016 · 6 comments
Open

Saving body of an interrupted PATCH request #89

janko opened this issue Aug 19, 2016 · 6 comments

Comments

@janko
Copy link
Contributor

janko commented Aug 19, 2016

First of all, thank you for an awesome specification, I really enjoyed implementing it!

I wanted to propose extending/altering the specification concerning the following part:

The Server SHOULD always attempt to store as much of the received data as possible.

In Ruby most of the web frameworks don't support this. There is one web framework, Goliath, which is based on EventMachine, a Ruby alternative to Node's event-driven reactor pattern. The currently registered Ruby tus server, Rubytus, uses Goliath. However, I find some other web frameworks much easier to use than Goliath, with a much clearer line of handling the request. And I also prefer to have a Ruby tus implementation use a web framework which isn't bound to EventMachine, but can use any web server that the main application uses.

I don't know if web frameworks in other languages have this already solved, but this ability definitely affects how the client should send the request. If the server supports this, the client can send just one PATCH request with the whole file. Otherwise the client should send multiple PATCH requests, which can be retried in case of failure. Therefore I think the specification should be updated in one of the following ways:

  1. Have the server signal whether they're capable of storing an interrupted PATCH request (via an extension)
  2. Change the sentence to "The Server MUST always attempt to store as much of the received data as possible."

I obviously prefer the option 1, because it allows more flexibility. The downside of this option is that synchronously sending multiple smaller PATCH requests is probably slower than sending one big one. So if the client was updated to check this extension, any existing server which currently supports it but doesn't notify that via Tus-Extension will start receiving multiple PATCH requests.

However, server not supporting saving interrupted PATCH requests doesn't actually have to mean less performance. The client can use the Concatenation extension to upload multiple chunks in parallel, which as I understood can be faster than a single PATCH request. Therefore I think that this server requirement should definitely be optional, because IMO the client doesn't need this to achieve maximum performance in an upload.

Thoughts?

@janko janko changed the title Saving body of an interrupted request Saving body of an interrupted PATCH request Aug 19, 2016
@Acconut
Copy link
Member

Acconut commented Aug 19, 2016

First of all, thank you for an awesome specification, I really enjoyed implementing it!

Thank you a lot for these kind words. It's people like you which makes working on this project so interesting!

In Ruby most of the web frameworks don't support this.

I just want to clarify whether I understood your statement correctly. Most web frameworks in Ruby only provide you with the entire request body. And if the server could not receive the entire payload, you can not handle only this piece of data since it will be dropped. Is this what you meant?

However, I find some other web frameworks much easier to use than Goliath, with a much clearer line of handling the request. And I also prefer to have a Ruby tus implementation use a web framework which isn't bound to EventMachine, but can use any web server that the main application uses.

Personally, I have never worked with Ruby except for fixing some small bugs here and there, so while I believe understanding what you're talking about, I do not have the knowledge to comment on this in a professional way.

Have the server signal whether they're capable of storing an interrupted PATCH request (via an extension)

That would be a good solution to this specific problem, but I think we can generalize this. There are some cases in which the server wants to give the client a recommendation on how big a single PATCH request should be. To name one example, the AWS S3 backend for tusd only accepts chunks which are bigger than 5MB but not much more than that. This is not a weird limitation we came up with but is one which is set by Amazon Web Services. Currently, there is no way to tell the client how it should partition its request. I talked to @kvz briefly about introducing the Tus-Min-Chunk-Size and Tus-Max-Chunk-Size which indicate recommendations (no strict limits) for the client. These headers could be included in the OPTIONS response, where also the Tus-Extension header lives.

This feature could also be helpful for your issue where the server wants to tell the client that it should split the upload into multiple smaller requests.

What do you think about this idea?

@janko
Copy link
Contributor Author

janko commented Aug 20, 2016

I just want to clarify whether I understood your statement correctly. Most web frameworks in Ruby only provide you with the entire request body. And if the server could not receive the entire payload, you can not handle only this piece of data since it will be dropped. Is this what you meant?

Yes, that's exactly what I meant.

There are some cases in which the server wants to give the client a recommendation on how big a single PATCH request should be.

Yes, this would definitely be a superset of what I was asking, so it would allow me to achieve what I was asking. However, I feel like the information about Tus-Min-Chunk-Size and Tus-Max-Chunk-Size, when we're talking about regular servers (not Amazon S3), is something that the client should determine based client-side factors. For example, if the client determines that the internet connection is too slow and flaky, it could switch to smaller chunks. Also, depending on whether the client is uploading chunks in series or in parallel, using certain chunk sizes might be more optimized (although I have no idea if that actually makes a difference).

What would be ideal is if the server could just hint the client: "split large files into multiple PATCH requests". On one hand this can be considered just a limitation of that specific server implementation, and I could just note it in the README. On the other hand, if the server doesn't support saving terminated PATCH request, it cannot achieve a resumable upload (because that one PATCH request will always be restarted from 0 if it fails), so I feel like that the client should be able to automatically find this out.

While this can definitely be achieved by the server sending Tus-Max-Chunk-Size, I don't know how should the server implementations determine this number (in case they're using file storage which doesn't have limits).

@Acconut
Copy link
Member

Acconut commented Aug 21, 2016

However, I feel like the information about Tus-Min-Chunk-Size and Tus-Max-Chunk-Size, when we're talking about regular servers (not Amazon S3), is something that the client should determine based client-side factors. For example, if the client determines that the internet connection is too slow and flaky, it could switch to smaller chunks.

I had the similar thought and that's why I prefer them to be a recommendations by the server:

[...] introducing the Tus-Min-Chunk-Size and Tus-Max-Chunk-Size which indicate recommendations (no strict limits) for the client.

The client should not blindly follow these numbers but instead use them as base line for finding the optimal chunk size by taking additional factors (e.g. bandwidth) into the calculation.

What would be ideal is if the server could just hint the client: "split large files into multiple PATCH requests".

This does not sound like a ideal solution to me. What are "large files"? And how much is "multiple requests"? The answers to these questions are different depending on in which environment the protocol is used.

On the other hand, if the server doesn't support saving terminated PATCH request, it cannot achieve a resumable upload (because that one PATCH request will always be restarted from 0 if it fails), so I feel like that the client should be able to automatically find this out.

Good thought, I absolutely agree and using the new headers this may also be possible. The Tus-Min-Chunk-Size may be defined as the lower limit of accepted chunks. If the client submits a chunk small than this size, it must not expect it to be accepted (but the server may do depending on its capabilities).

While this can definitely be achieved by the server sending Tus-Max-Chunk-Size, I don't know how should the server implementations determine this number (in case they're using file storage which doesn't have limits).

Servers should not be forced to supply these numbers but may do so, if it's useful in their situation, e.g. the AWS S3 backend. For the simple file storage, these recommendations do not make much sense and it should not promote these headers by not exposing them. This should be interpreted as if the server does not mind about how big a chunk should be and it's totally up to the client to decide on this topic.

@janko
Copy link
Contributor Author

janko commented Aug 22, 2016

This does not sound like a ideal solution to me. What are "large files"? And how much is "multiple requests"? The answers to these questions are different depending on in which environment the protocol is used.

I agree, but the question is which environment, server-side or client-side? I think that what is a "large file" definitely has to be decided somewhere (otherwise the client wouldn't know whether to split the file), and with Tus-Max-Chunk-Size this would be decided on the server. I think it's just a question whether the server is the one who should decide, but I'm actually ok that it does.

The client should not blindly follow these numbers but instead use them as base line for finding the optimal chunk size by taking additional factors (e.g. bandwidth) into the calculation.

You're right, if the server sends Tus-Min-Chunk-Size and Tus-Max-Chunk-Size, the client will still be able to choose inside that range what is the size of the chunk depending on factors like bandwidth.

Thank you for this discussion, I'm now completely on board that Tus-Min-Chunk-Size and Tus-Max-Chunk-Size will correctly solve the situation where the server doesn't have the ability to apply an interrupted PATCH request 👍

@Acconut Acconut added this to the 1.1 milestone Aug 30, 2016
Acconut added a commit that referenced this issue Aug 30, 2016
See #89 for the underlying discussion
@mmarzec
Copy link

mmarzec commented Feb 6, 2017

Hi,
I am currently implementing Tus on python twisted framework and I encountered exactly same problem. Twisted offers only handling of full http requests. If the request is interrupted it is discarded. It would be possible to implement handling partial requests but that would require deep understanding of framework internals and most likely changing the framework which I would prefer to avoid.
I think it is same issue with many more web frameworks and actually some other python implementations that are listed on Tus web page has also this issue.
So to have resumable uploads implemented with this limitation it is needed to split data into smaller chunks.

Also if you have extension checksum and got interrupted http request the entire data from this request will not match checksum from header. So actually checksum extension makes unusable interrupted requests and makes need to split data into reasonable size chunks.

I see that you already have pending pull request to add min/max chunk sizes so I am just giving you more reasons about importance of this changes.

@Acconut
Copy link
Member

Acconut commented Feb 15, 2017

Thank you for your feedback, @mmarzec. I am currently a bit low on time but this will be pushed forward once my schedule cleans a bit up.

pauln added a commit to pauln/tus-js-client that referenced this issue Nov 19, 2017
Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing.

This is to help anyone who doesn't realise that they're hitting such limits.  See the following for further information / discussion:
- tus/tus-resumable-upload-protocol#89
- tus/tus-resumable-upload-protocol#93
Acconut pushed a commit to tus/tus-js-client that referenced this issue Nov 20, 2017
Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing.

This is to help anyone who doesn't realise that they're hitting such limits.  See the following for further information / discussion:
- tus/tus-resumable-upload-protocol#89
- tus/tus-resumable-upload-protocol#93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants