You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For many services like CloudTrail and S3 access logs, dominating factor regarding key count has more to do with architectural constraints of the writer than anything useful for downstream processing.
In the case of S3 access logs, ability to assemble millions of tiny files into a larger access log is pretty much a requirement before any meaningful local processing (e.g. DuckDB, clickhouse-local, ..) can occur.
Would therefore love if something like the following command could work:
It's already easy to do this in 3 steps (cp, find | xargs cat > ...txt), but that at least requires having local storage available, and does not allow starting otherwise streamy processing (such as clickhouse-local reading from stdin) until all downloads are complete
The text was updated successfully, but these errors were encountered:
Hi there,
For many services like CloudTrail and S3 access logs, dominating factor regarding key count has more to do with architectural constraints of the writer than anything useful for downstream processing.
In the case of S3 access logs, ability to assemble millions of tiny files into a larger access log is pretty much a requirement before any meaningful local processing (e.g. DuckDB, clickhouse-local, ..) can occur.
Would therefore love if something like the following command could work:
s5cmd cat s3://s3-logs-bucket/someprefix/2024-03-27-* > 2024-03-27.txt
Similarly,
s5cmd cat s3://..../* | clickhouse-local
It's already easy to do this in 3 steps (cp, find | xargs cat > ...txt), but that at least requires having local storage available, and does not allow starting otherwise streamy processing (such as clickhouse-local reading from stdin) until all downloads are complete
The text was updated successfully, but these errors were encountered: