Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

Open
vikramarsid opened this issue Jul 8, 2022 · 0 comments

Comments

@vikramarsid
Copy link

vikramarsid commented Jul 8, 2022

Describe the bug
I am trying to load multiple concurrent parquet files into memory and try to read them row by row. I am facing OOM issue while I read 10 concurrent file of 50MB each. Do you see any obvious things in the call graph ? Thank you!!

Unit test to reproduce
Please provide a unit test, either as a patch or text snippet or link to your fork. If you can't isolate it into a unit test then please provide steps to reproduce.

parquet-go specific details

  • What version are you using? v0.11.0

Misc Details

  • Are you using AWS Athena, Google BigQuery, presto... ? AWS S3
  • Any other relevant details... how big are the files / rowgroups you're trying to read/write? 10 - 100 MB
  • Do you have memory stats to share? Yes
  • Can you provide a stacktrace? Yes

parquet-go-pprof

@vikramarsid vikramarsid changed the title Is there a way to reduce memory utilization in deictPageReader/PageReader ? Is there a way to reduce memory utilization in dictPageReader/PageReader ? Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant