Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage:s3): multi-part upload: upload parts concurrently #272

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jeqo
Copy link
Contributor

@jeqo jeqo commented Jun 9, 2023

Uses CompletableFuture to run upload parts requests concurrently, and improve upload performance.

Resolves: #125

@jeqo jeqo requested a review from a team as a code owner June 9, 2023 11:43
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 4 times, most recently from a78f09a to d8d9f7d Compare June 13, 2023 16:22
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 2 times, most recently from 6cb6301 to 3db7869 Compare June 14, 2023 10:21
@jeqo jeqo requested a review from AnatolyPopov June 14, 2023 10:22
Copy link
Contributor

@AnatolyPopov AnatolyPopov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more thoughts

@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 3db7869 to 0c67b5c Compare June 14, 2023 11:33
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 3 times, most recently from 9088b6f to fb3f9df Compare June 14, 2023 13:52
@jeqo jeqo requested a review from AnatolyPopov June 14, 2023 14:10
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from fb3f9df to 2254ec1 Compare June 14, 2023 14:15
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 2254ec1 to a0bcd0e Compare June 15, 2023 08:24
@jeqo jeqo requested a review from AnatolyPopov June 15, 2023 08:28
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 6 times, most recently from 708cc14 to 004908c Compare June 19, 2023 09:31
@jeqo jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 004908c to 8152054 Compare June 19, 2023 09:34
@ivanyu ivanyu self-assigned this Jun 19, 2023
Copy link
Contributor

@ivanyu ivanyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach has a drawback: we can't realistically use big part sizes. We need to target some 100 MB part sizes. With uncontrollable parallelism + memory arrays, this is not going to end well. Mainly we need to get rid of memory arrays and stream data directly from disk.

Ideally we should have something like this:

  1. Split the file into parts, virtually, just ranges.
  2. Upload parts in parallel from file-based InputStreams (applying transfomations, of course).
  3. Control parallelism, we need an explicit configurable number.
  4. Recombine the chunk index in the end, should be simple arithmetics.
  5. Some mechanism for cancelling all parallel uploads if one of them fails (maybe we don't need this if S3 starts rejecting uploads promptly enough after the multipart upload has been cancelled).

This will lead to changing some interfaces + making it required that the chunks size is a multiple of the chunk size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the multipart upload performance is not ideal
3 participants