Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Detection of parallel uploads - respond with "423 Locked" #115

Open
KieranP opened this issue Nov 23, 2016 · 12 comments
Open

Discussion: Detection of parallel uploads - respond with "423 Locked" #115

KieranP opened this issue Nov 23, 2016 · 12 comments

Comments

@KieranP
Copy link

KieranP commented Nov 23, 2016

Saw this issue happen today:

  • User began uploading a large file, the url was stored in local storage
  • User began uploading the same file again (but to a different object)

Currently, one of them will return "409 Conflict" as the spec instructs and thus halts, while the other is was able to continue uploading fine.

"409" is good when the client does something it shouldn't and thus acts as a safeguard. But could we provide a better http status code when the server detects that the file is being uploaded already? e.g. The http status code "423" is defined as Locked, which could be more appropriate in this case.

This would allow us to catch "423" errors, and handle them appropriately.

Thoughts?

@ronag
Copy link
Contributor

ronag commented Nov 23, 2016

User began uploading the same file again (but to a different object)

But isn't that a different upload? How is the server supposed to find out that it is the same file?

@KieranP
Copy link
Author

KieranP commented Nov 23, 2016

@ronag No, because the fingerprint matches, and thus is uses the same resumable url that it locates from the local storage. So it ends up uploading the same file to the same resumable url, causing the 409 conflict.

When I said "to a different object", I meant object within our own platform. So the user tried uploading the same file at the same time but to two different places within the platform.

@cjhenck
Copy link

cjhenck commented Nov 23, 2016

I'm confused too. So you are uploading the same file from the same client to the same PATCH URL? How does the server know it is a conflict at all? Is the metadata different?

@Acconut
Copy link
Member

Acconut commented Nov 23, 2016

e.g. The http status code "423" is defined as Locked, which could be more appropriate in this case.

The tusd server implements locking which will return a 423 if multiple client try to access the same upload simultaneously. However, this is custom behavior and is not defined in the specification.

No, because the fingerprint matches, and thus is uses the same resumable url that it locates from the local storage. So it ends up uploading the same file to the same resumable url, causing the 409 conflict.

If you configure tus-js-client correctly using its retryDelays property, it will automatically catch those errors and start a new upload to a different upload URL.

I think the underlying question is what should actually happen in the scenario @KieranP described? What did you expect? Should the same file be uploaded to two different upload URLs? Should one wait until the other is finished since they assume they are uploading the same file?

@KieranP
Copy link
Author

KieranP commented Nov 24, 2016

@cjhenck Yes, same file, same client, same PATCH URL because it pulls it from local storage. I imagine that

The tusd server implements locking which will return a 423 if multiple client try to access the same upload simultaneously. However, this is custom behavior and is not defined in the specification.

@Acconut Cool to hear that tusd already does this. It would be good to get this in as a plugin on the spec though so that other servers implement this behaviour.

I think the underlying question is what should actually happen in the scenario @KieranP described? What did you expect? Should the same file be uploaded to two different upload URLs? Should one wait until the other is finished since they assume they are uploading the same file?

I'd much rather catch a 423 error and report to the user that the file is already being uploaded. It makes no sense to me to have an identical copy of a 30GB file uploading to the same server. But this could always be a client setting if the server supports it, something like preventDuplicate: true

@cjhenck
Copy link

cjhenck commented Nov 24, 2016

I think I was tired this week and failed to read the title of the issue, everything you've said makes complete sense. If a 423 is returned in the reference implementation then I would think that it would be good to define that as a recommendation.

@ronag
Copy link
Contributor

ronag commented Nov 26, 2016

The way we solve this issue is to actually allow parallel uploads under the assumption that both clients are uploading the same data and we don't check that upload-offset is actually at the current end.

@Acconut
Copy link
Member

Acconut commented Nov 26, 2016

@KieranP: I'd much rather catch a 423 error and report to the user that the file is already being uploaded. It makes no sense to me to have an identical copy of a 30GB file uploading to the same server. But this could always be a client setting if the server supports it, something like preventDuplicate: true

I agree that should a setting for the client would make sense. I will look into how this can be achieved. Furthermore, this may be a good opportunity to add a recommendation for this to the specification. However, I am currently not sure whether this should be a short sentence in the core protocol or deserves its own extension as there are some more cases to consider:

The tusd server will also return 423 Locked when a HEAD request is issued while a PATCH request is running to the same upload URL. The reason is that the Upload-Offset header may not be correct anymore as the offset is only updated once, if the PATCH request completes but not in between. Personally, I am not sure if this is the best approach but for now it has served the purpose.

@ronag: The way we solve this issue is to actually allow parallel uploads under the assumption that both clients are uploading the same data and we don't check that upload-offset is actually at the current end.

Honestly, this seems very dangerous to me. If we added the locking mechanism as we talked about before, would you benefit from this additional security?

@ronag
Copy link
Contributor

ronag commented Nov 26, 2016

Honestly, this seems very dangerous to me.

I don't see how it is any less dangerous than appending to the file. Currently there is no guarantee that the data appended is from the same source. In what sense would you consider just overwriting more dangerous?

If we added the locking mechanism as we talked about before, would you benefit from this additional security?

Not really...

@kaharman
Copy link

kaharman commented Apr 4, 2018

I just want to add another case for status 423 Locked. I upload a file with web client, then in the middle of process the internet connection is broke up. Then I change with another connection method with expectation I can resume the upload, let say from wifi to LAN cable. What I got is error 423. I don't know how to handle this since I just upload single file in same browser, but with different connection. Do you have an idea @Acconut ?

@Acconut
Copy link
Member

Acconut commented Apr 12, 2018

@kaharman The problem in your situation is that the server does not always know that the connection broke and therefore still waits for the first client to upload content. In this case, the upload is still locked if the second client attempts to upload. However, time will usually solve this problem. The server will notice after a timeout that the connection to the first client is broken and will unlock the upload again allowing the second client to continue the upload. In summary, you would just have to wait (usually a few seconds in my experience) before retrying.

@mitar
Copy link

mitar commented Apr 15, 2024

Yes, same file, same client, same PATCH URL because it pulls it from local storage.

To me this looks like local storage caching in the browser is not properly implemented and it should be keyed based on your platform's object ID as well. So if user is uploading a file to object1 then local storage should cache the file only for object1 uploading. If user tries to upload a file again to object2, it should not go from local storage, thus it would be uploaded normally twice. And you would not have this problem. So this to me looks like a caching in local storage issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants