-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to upload stream upload direct AWS S3? #61
Comments
just for info, if someone reads this issue: you can't do this with parquet. it's not related to this library parquet compresses data as you write to the library but you need extra memory for parquet compression ratio is very high so if you write a 512mb parquet file In theory, In general, if you are really tight on memory / budget you should consider moving to csv+gzip / avro / csv / json-lines which you |
I don't think it's possible to stream to S3 unless you know the exact file size when calling the S3 API, which is something you don't really know when mutating the source data in your stream. |
@SimonJang are you sure ? |
What about (streaming) reading over HTTP? is this supported? |
I have too large amount no-sql data, I want to read data as stream and just pass the schema and stream . I will upload on S3 as parquet file . Due to large amount data can't store on local so I don't want to store file in memory or physically memory . Please advise me
The text was updated successfully, but these errors were encountered: