Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library tries to uncompress pages even if declared uncompressed #102

Open
papanikge opened this issue Apr 9, 2024 · 0 comments
Open

Library tries to uncompress pages even if declared uncompressed #102

papanikge opened this issue Apr 9, 2024 · 0 comments

Comments

@papanikge
Copy link

papanikge commented Apr 9, 2024

Describe the bug

Parquet files - when compressed - are so in the page layer. Parquet supports compression per page, (as shown from the DataPageHeaderV2 IsCompressed field, which comes directly from the thrift definition). The library detects the compression type (called CompressionCodec) and passes that down to the newBlockReader level. However it still needs to check if that specific page is indeed compressed, and that was missing.

Unit test to reproduce

I have a slim and simple unit test here, but I could write a full-fledged one with a test file if required.

parquet-go specific details

  • v0.12.0

Misc Details

  • I have already patched this in a fork and we're using it in Panther's production for the last 2 weeks. It seems it's working.
  • I have tested it with a test file too (not sure where to upload it if you guys want it)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant