Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presto Parquet writer writes timestamp in 96 bits width #22605

Open
yingsu00 opened this issue Apr 24, 2024 · 2 comments
Open

Presto Parquet writer writes timestamp in 96 bits width #22605

yingsu00 opened this issue Apr 24, 2024 · 2 comments
Assignees
Labels

Comments

@yingsu00
Copy link
Contributor

According to https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp, INT96 has been deprecated and Parquet Timestamp logical type is now annotated with INT64. But the Parquet writer writes INT96 according to internal report.
See facebookincubator/velox#4680 (comment)

We need to

  1. verify if the parquet writer version was set up correctly
  2. verify if it really writes 96 bits values
  3. If verified, then it needs to be updated to INT64
@nmahadevuni
Copy link
Member

By default, optimized writer is disabled, it uses RecordFileWriter which uses int96 for Timestamp
message hive_schema {
optional int96 t;
}
When the optimized writer is enabled, the new writer uses int64 for Timestamp.
message presto_schema {
optional int64 t;
}

@nmahadevuni
Copy link
Member

Session config is <catalog>.parquet_optimized_writer_enabled and the catalog property is hive.parquet.optimized-writer.enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

2 participants