Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Native workers not writing Parquet data files for WriterVersion v1 (PARQUET_1_0) #22595

Open
agrawalreetika opened this issue Apr 24, 2024 · 2 comments · May be fixed by facebookincubator/velox#9700
Assignees
Labels
bug prestissimo Presto Native Execution

Comments

@agrawalreetika
Copy link
Member

Native worker not writing Parquet data files for WriterVersion v1 (PARQUET_1_0)

Your Environment

  • Presto version used: 0.288-SNAPSHOT
  • Storage (HDFS/S3/GCS..): S3
  • Prestissimo Setup on Local

Expected Behavior

When set session hive.parquet_writer_version='PARQUET_1_0';
Parquet data should be written in format_version 1

Current Behavior

Even if when setting set session hive.parquet_writer_version='PARQUET_1_0'; Parquet data is written in format_version: 2.6

Possible Solution

Steps to Reproduce

presto:reetika_testdb> set session hive.parquet_writer_version='PARQUET_1_0';
SET SESSION

presto:reetika_testdb> create table hive.reetika_testdb.test_insert (id int) with (format = 'Parquet');
CREATE TABLE

presto:reetika_testdb> insert into hive.reetika_testdb.test_insert values(1);
INSERT: 1 row

Sample Output of Parquet File -

############ file meta data ############
created_by: parquet-cpp-velox
num_columns: 1
num_rows: 1
num_row_groups: 1
format_version: 2.6
serialized_size: 146


############ Columns ############
id

############ Column(id) ############
name: id
path: id
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: None
converted_type (legacy): NONE
compression: GZIP (space_saved: -56%)

Screenshots (if appropriate)

Context

Looks like the session property for parquet_writer_version is not honored in Prestissimo. Same works fine with Jave Parquet Writer

@majetideepak
Copy link
Collaborator

Velox uses the Arrow Parquet Writer. I see that there is an option to specify V1
https://github.com/apache/arrow/blob/main/cpp/src/parquet/properties.h
Let's add it to Velox. Can you point me to a test for V1 vs V2?

@svm1
Copy link

svm1 commented May 3, 2024

Fix in progress - facebookincubator/velox#9700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug prestissimo Presto Native Execution
Projects
Status: 🆕 Unprioritized
Status: Backlog
Development

Successfully merging a pull request may close this issue.

6 participants