Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Chunked/resumable downloads (use Range: byte... request header) #2576

Open
aseanwatson opened this issue Dec 30, 2023 · 1 comment

Comments

@aseanwatson
Copy link

aseanwatson commented Dec 30, 2023

Is your feature request related to a problem? Please describe.

It seems like ondrive just does a GET to read file content. If the files are big enough, this just times out and fails.

Describe the solution you'd like

This includes a lot of detailed suggestions. Obviously, whoever is kind enough to implement this should take this as a "serving suggestion" rather than hard requirements.

See downloadFile.

  1. Store .partial files in a temporary folder (not in the folder tree being synchronized). Have your database track in-progress temporary files, including id, eTag, expected length, and hash. It's probably smart to name it by the id to handle renames during download (but renames/moves change the eTag, so maybe it's not worth it).
  2. Have downloadFile check to see if there's an in-progress download; if not create a zero-byte file and register it in the database. (If it exists in the database but has different eTag/hash/length from what the caller wants, delete the temporary file and database row and create new ones).
  3. Open the temporary file in append mode and get its length.
  4. For the GET request add If-Match and Range request headers. The range can be:
    • the current length of the temporary file to end (e.g.: "bytes " ~ str(currentLength) ~"-" ) or
    • can have some fixed size ("bytes " ~ str(currentLength) ~ "-" ~ str(min(expectedLength, currentLength + maxChunkSize))
  5. You will want to handle a 206 (partial content) response and double check that the Content-Range is right (server is allowed to send back data outside request range, so you may need to skip returned data in response). You can also confirm the server size is expected.
  6. If you're using the maxChunkSize option above, you should loop until you read all the content. Otherwise, if you get a 200/206 with all its content, the file is downloaded.
  7. Once you get the right size file, verify the hash and move it to the correct location/name. (Not sure if this is unlink destination + link temp to destination + unlink temp or something that preserves existing inside-based data hard-links/permissions/crtime/etc.) In monitor mode, you need to think about local changes to the file while it's being downloaded.
  8. If the request is interrupted (timeout, client killed, network disconnected, etc.), just try again when/if you can.
  9. If you get a 412 (eTag mismatch) or a 416 (bad range) response, the file changed in the service while you were downloading it. Delete the temporary file and queue one with the updated eTag/length/hash.
  10. If the file is deleted in the service while downloading, you'll get a 404; delete the temporary file.

Describe alternatives you've considered

I could use a browse/download manager to get big files.

Additional context

No response

@abraunegg
Copy link
Owner

@aseanwatson
Thanks for your suggestion and detail above. Resumable downloads is currently on the slate for v2.5.x implementation, it just had no formal feature request.

@abraunegg abraunegg added this to the v2.5.x milestone Dec 30, 2023
@abraunegg abraunegg changed the title Chunked/resumable downloads (use Range: byte... request header) Feature Request: Chunked/resumable downloads (use Range: byte... request header) Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants