Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support for parquet_writer_version session property #9700

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

svm1
Copy link
Collaborator

@svm1 svm1 commented May 3, 2024

Resolves prestodb/presto#22595.

Implemented setting of version via session property:
set session hive.parquet_writer_version='PARQUET_1_0';

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 3, 2024
Copy link

netlify bot commented May 3, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit a9b4895
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/663adb69ba69f200080ed98b

@svm1
Copy link
Collaborator Author

svm1 commented May 3, 2024

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions.
https://www.javadoc.io/doc/org.apache.parquet/parquet-column/1.10.1/org/apache/parquet/column/ParquetProperties.WriterVersion.html

Copy link
Collaborator

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. A couple of comments. Please try to remember to run make format-fix before pushing changes.

velox/dwio/parquet/writer/Writer.cpp Outdated Show resolved Hide resolved
velox/connectors/hive/HiveDataSink.cpp Outdated Show resolved Hide resolved
@svm1 svm1 changed the title Add support for parquet_writer_version session property [WIP] Add support for parquet_writer_version session property May 3, 2024
@svm1 svm1 changed the title [WIP] Add support for parquet_writer_version session property Add support for parquet_writer_version session property May 8, 2024
@svm1 svm1 marked this pull request as ready for review May 8, 2024 02:36
@svm1 svm1 marked this pull request as draft May 8, 2024 02:36
@svm1 svm1 changed the title Add support for parquet_writer_version session property [WIP] Add support for parquet_writer_version session property May 8, 2024
@majetideepak
Copy link
Collaborator

majetideepak commented May 13, 2024

Java seems to only support 1.0 and 2.0, while C++ supports more granular versions.

There are two parquet versions and they are confusing.

  1. The parquet format version: https://github.com/apache/parquet-format/tags
  2. The parquet datapageversion (V1, V2) https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L562

The C++ parquet datapageversion is what the java parquet parquet_writer_version maps to.

@majetideepak
Copy link
Collaborator

This confusion was also recently discussed in the Arrow community
https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[native] Native workers not writing Parquet data files for WriterVersion v1 (PARQUET_1_0)
4 participants