feat(storage:s3): multi-part upload: upload parts concurrently #272

jeqo · 2023-06-09T11:43:57Z

Uses CompletableFuture to run upload parts requests concurrently, and improve upload performance.

Resolves: #125

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java

AnatolyPopov

Some more thoughts

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java

...ge/s3/src/test/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStreamTest.java

ivanyu

This approach has a drawback: we can't realistically use big part sizes. We need to target some 100 MB part sizes. With uncontrollable parallelism + memory arrays, this is not going to end well. Mainly we need to get rid of memory arrays and stream data directly from disk.

Ideally we should have something like this:

Split the file into parts, virtually, just ranges.
Upload parts in parallel from file-based InputStreams (applying transfomations, of course).
Control parallelism, we need an explicit configurable number.
Recombine the chunk index in the end, should be simple arithmetics.
Some mechanism for cancelling all parallel uploads if one of them fails (maybe we don't need this if S3 starts rejecting uploads promptly enough after the multipart upload has been cancelled).

This will lead to changing some interfaces + making it required that the chunks size is a multiple of the chunk size.

jeqo requested a review from a team as a code owner June 9, 2023 11:43

jeqo mentioned this pull request Jun 12, 2023

feat: add s3 upload benchmarks #273

Open

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 4 times, most recently from a78f09a to d8d9f7d Compare June 13, 2023 16:22

AnatolyPopov requested changes Jun 14, 2023

View reviewed changes

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 2 times, most recently from 6cb6301 to 3db7869 Compare June 14, 2023 10:21

jeqo requested a review from AnatolyPopov June 14, 2023 10:22

AnatolyPopov reviewed Jun 14, 2023

View reviewed changes

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 3db7869 to 0c67b5c Compare June 14, 2023 11:33

AnatolyPopov reviewed Jun 14, 2023

View reviewed changes

storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStream.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 3 times, most recently from 9088b6f to fb3f9df Compare June 14, 2023 13:52

jeqo requested a review from AnatolyPopov June 14, 2023 14:10

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from fb3f9df to 2254ec1 Compare June 14, 2023 14:15

AnatolyPopov requested changes Jun 15, 2023

View reviewed changes

...ge/s3/src/test/java/io/aiven/kafka/tieredstorage/storage/s3/S3MultiPartOutputStreamTest.java Outdated Show resolved Hide resolved

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 2254ec1 to a0bcd0e Compare June 15, 2023 08:24

jeqo requested a review from AnatolyPopov June 15, 2023 08:28

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch 6 times, most recently from 708cc14 to 004908c Compare June 19, 2023 09:31

feat(storage:s3): multi-part upload: upload parts concurrently

8152054

jeqo force-pushed the jeqo/s3-concurrent-multi-part branch from 004908c to 8152054 Compare June 19, 2023 09:34

ivanyu self-assigned this Jun 19, 2023

ivanyu requested changes Jun 19, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage:s3): multi-part upload: upload parts concurrently #272

feat(storage:s3): multi-part upload: upload parts concurrently #272

jeqo commented Jun 9, 2023

AnatolyPopov left a comment

ivanyu left a comment •

edited

feat(storage:s3): multi-part upload: upload parts concurrently #272

Are you sure you want to change the base?

feat(storage:s3): multi-part upload: upload parts concurrently #272

Conversation

jeqo commented Jun 9, 2023

AnatolyPopov left a comment

Choose a reason for hiding this comment

ivanyu left a comment • edited

Choose a reason for hiding this comment

ivanyu left a comment •

edited