Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command: extend select api support #611

Merged
merged 62 commits into from Aug 21, 2023

Conversation

denizsurmeli
Copy link
Contributor

@denizsurmeli denizsurmeli commented Jul 24, 2023

This PR extends the support for AWS S3's Select API.

Resolves #494, resolves #357 .

@denizsurmeli denizsurmeli changed the title cmd: extend select api support. cmd: extend select api support Jul 24, 2023
@denizsurmeli
Copy link
Contributor Author

The aws-sdk-go version that we use have a bug that it logs the response header regardless of the log level set in s5cmd. Here is the related issue, so I will bump to the version that has the fix and check if anything breaks.

@denizsurmeli
Copy link
Contributor Author

Minio panics during SELECT queries. Open to suggestions about testing.

panic: runtime error: index out of range [0] with length 0

goroutine 779 [running]:
github.com/minio/minio/internal/s3select/csv.NewReader.func1({0x5929480?, 0xc0093743f0?})
	github.com/minio/minio/internal/s3select/csv/reader.go:306 +0x425
github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders.func3()
	github.com/minio/minio/internal/s3select/csv/reader.go:244 +0x19c
created by github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders
	github.com/minio/minio/internal/s3select/csv/reader.go:233 +0x705

@denizsurmeli
Copy link
Contributor Author

denizsurmeli commented Jul 26, 2023

Minio panics during SELECT queries. Open to suggestions about testing.

panic: runtime error: index out of range [0] with length 0

goroutine 779 [running]:
github.com/minio/minio/internal/s3select/csv.NewReader.func1({0x5929480?, 0xc0093743f0?})
	github.com/minio/minio/internal/s3select/csv/reader.go:306 +0x425
github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders.func3()
	github.com/minio/minio/internal/s3select/csv/reader.go:244 +0x19c
created by github.com/minio/minio/internal/s3select/csv.(*Reader).startReaders
	github.com/minio/minio/internal/s3select/csv/reader.go:233 +0x705

Minio panics only on CSV queries that you don't specify the delimiter, which is weird that it does not validate the delimiter in the request, JSON queries seemed fine.

@igungor
Copy link
Member

igungor commented Jul 26, 2023

PR also fixes #357 @denizsurmeli

@denizsurmeli
Copy link
Contributor Author

I'd like to share my expectations while playing with this feature:

  1. Expected CSV output if I query CSV objects (same thing for TSV, or other formats).
./s5cmd select csv --delimiter "," -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.csv'
{"_1":"id","_2":"name","_3":"price"}
{"_1":"1","_2":"avocado","_3":"3.99"}
{"_1":"2","_2":"banana","_3":"1.99"}
{"_1":"3","_2":"cabbage","_3":"0.99"}
  1. How do I query TSV files?
./s5cmd select csv --output-format csv --delimiter "\t" -e 'select * from s3object' 's3://bucket/ibrahim/s5cmd-select/prices.tsv'
ERROR "select csv --delimiter=\\t --query=select * from s3object --output-format=csv s3://bucket/ibrahim/s5cmd-select/prices.tsv": InvalidRequestParameter: The value of parameter FieldDelimiter is invalid. Please check the service documentation and try again. status code: 400, request id: ...

For the first issue, you are right, I have implemented the feature. For the second question, I have added an example query to the help command. I have also extended the flag descriptions.

e2e/util_test.go Outdated Show resolved Hide resolved
@denizsurmeli
Copy link
Contributor Author

Applied the suggestions and refactored the code. @igungor

command/select.go Outdated Show resolved Hide resolved
command/select.go Outdated Show resolved Hide resolved
command/select.go Outdated Show resolved Hide resolved
command/select.go Outdated Show resolved Hide resolved
command/select.go Outdated Show resolved Hide resolved
command/select.go Show resolved Hide resolved
storage/s3.go Show resolved Hide resolved
igungor
igungor previously approved these changes Aug 21, 2023
seruman
seruman previously approved these changes Aug 21, 2023
@denizsurmeli denizsurmeli dismissed stale reviews from seruman and igungor via f1aef24 August 21, 2023 14:09
@igungor igungor merged commit a115f0c into peak:master Aug 21, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Select document JSON content type (multiple lines) Add CSV support to s5cmd select
4 participants