Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming uploads for S3 FileSystem Implementation #22536

Open
ZacBlanco opened this issue Apr 16, 2024 · 0 comments
Open

Streaming uploads for S3 FileSystem Implementation #22536

ZacBlanco opened this issue Apr 16, 2024 · 0 comments

Comments

@ZacBlanco
Copy link
Contributor

The initial implementation of the S3 FileSystem interface in Presto wrote all data to a file on the local disk before using the AWS SDK's TransferManager API to handle the upload. This requires the machine to have disk space equal to the file(s) being written to S3.

PR #22424 Improves this by uploading each file part after a configurable N bytes are written to the temporary storage. This caps the maximum disk utilization at N. However, the performance takes a hit because once the upload commences, no more bytes can be written while the upload is occurring.

An optimal solution would be to queue and execute file part uploads as bytes are written, without blocking the writer. Trino has an implementation similar to this in trinodb/trino@f681708 .

Expected Behavior or Use Case

Don't block writers while uploading file parts to S3

Presto Component, Service, or Connector

  • PrestoS3FileSystem

Possible Implementation

See above description

Context

Better S3 FileSystem performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant