Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cat with glob #716

Open
occasionallydavid opened this issue Mar 27, 2024 · 0 comments
Open

Support cat with glob #716

occasionallydavid opened this issue Mar 27, 2024 · 0 comments

Comments

@occasionallydavid
Copy link

Hi there,

For many services like CloudTrail and S3 access logs, dominating factor regarding key count has more to do with architectural constraints of the writer than anything useful for downstream processing.

In the case of S3 access logs, ability to assemble millions of tiny files into a larger access log is pretty much a requirement before any meaningful local processing (e.g. DuckDB, clickhouse-local, ..) can occur.

Would therefore love if something like the following command could work:

s5cmd cat s3://s3-logs-bucket/someprefix/2024-03-27-* > 2024-03-27.txt

Similarly,

s5cmd cat s3://..../* | clickhouse-local

It's already easy to do this in 3 steps (cp, find | xargs cat > ...txt), but that at least requires having local storage available, and does not allow starting otherwise streamy processing (such as clickhouse-local reading from stdin) until all downloads are complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant