Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce compression extensions for CSV Files #11903

Open
wants to merge 12 commits into
base: feature
Choose a base branch
from

Conversation

pdet
Copy link
Member

@pdet pdet commented May 2, 2024

When writing a compressed CSV file, the correct extensions will be enforced.

For gzip: .csv.gz
For zstd: .csv.zst

One potential downside is that users won't be able to write a gzipped file that ends with something different than .csv.gz, or .csv.zst for zstd. Is this flexibility important?

The other issue is that I could imagine JSON and other file formats having the same issue, so maybe this implementation check should happen at a higher level.

@Mytherin
Copy link
Collaborator

Mytherin commented May 2, 2024

Thanks for the PR!

I think as you mentioned this is indeed not the right location for this fix. The problem is specifically for PARTITION_BY since the user only specifies a directory there instead of a file to write to. The extension is set in the BindCopyTo. The correct solution here seems to me to change which extension is selected if compression is selected.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 3, 2024 08:48
@pdet pdet marked this pull request as ready for review May 3, 2024 08:52
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 3, 2024 08:53
@pdet pdet marked this pull request as ready for review May 6, 2024 12:45
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 13, 2024 13:02
@pdet pdet marked this pull request as ready for review May 13, 2024 13:31
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 14, 2024 11:52
@pdet pdet marked this pull request as ready for review May 16, 2024 15:20
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 22, 2024 09:17
@pdet pdet changed the base branch from main to feature May 24, 2024 08:44
@pdet pdet marked this pull request as ready for review May 24, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants