Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the multipart upload performance is not ideal #125

Open
HenryCaiHaiying opened this issue Mar 14, 2023 · 0 comments · May be fixed by #272
Open

the multipart upload performance is not ideal #125

HenryCaiHaiying opened this issue Mar 14, 2023 · 0 comments · May be fixed by #272

Comments

@HenryCaiHaiying
Copy link

From the code, it doesn't looks to me it's really using S3 multi-threading for multipart upload:

  1. A while loop in S3ClientWrapper#uploadLogFile to break the original segment file into multiple parts and upload them one by one through the custom S3OutputStream: https://github.com/aiven/tiered-storage-for-apache-kafka/blob/main/s3/src/main/java/io/aiven/kafka/tiered/storage/s3/S3ClientWrapper.java#L179
  2. In S3OutputStream, the code will try to use S3's multipart upload API to upload the file in multiple chunks. But since the caller (S3ClientWrapper) already configured the file size to be config.s3StorageUploadPartSize, this would just end up with 1 upload in S3OutputStream

S3s multipart upload is supposed to use multiple threads to upload a big file concurrently onto S3. The current code path doesn't seem using S3's multipart upload threading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant