Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement chunked parquet reader in cudf-python #15728

Merged
merged 20 commits into from
Jun 6, 2024

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented May 12, 2024

Description

Partially Addresses: #14966

This PR implements chunked parquet bindings in python.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the Python Affects Python cuDF API. label May 12, 2024
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 22, 2024
Copy link

copy-pr-bot bot commented May 30, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue conda Java Affects Java cuDF API. labels May 30, 2024
@galipremsagar galipremsagar changed the base branch from branch-24.06 to branch-24.08 May 30, 2024 18:31
@galipremsagar galipremsagar marked this pull request as ready for review May 30, 2024 18:31
@galipremsagar galipremsagar requested a review from a team as a code owner May 30, 2024 18:31
@galipremsagar
Copy link
Contributor Author

/okay to test

@galipremsagar galipremsagar removed the libcudf Affects libcudf (C++/CUDA) code. label May 30, 2024
@galipremsagar galipremsagar added 3 - Ready for Review Ready for review by team and removed CMake CMake build issue conda Java Affects Java cuDF API. labels May 30, 2024
@galipremsagar
Copy link
Contributor Author

@GregoryKimball This PR is ready for review, I'll add the chunked concat and then enable using chunked parquet reader in cudf.pandas in a follow-up PR.

@GregoryKimball
Copy link
Contributor

Thank you @galipremsagar! This looks like a great addition, the debut of chunked parquet reading to cudf python ❤️

Copy link
Contributor

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Just a heads up:
Eventually, we'll probably want the binding for this to live in pylibcudf (so we'd need to rewrite the stuff added in this PR again at a later date).

Unfortunately, bindings for I/O haven't landed in the dev branch yet (I just started porting over a bunch of the classes we'd need for I/O like TableWithMetadata in #15899).

I think I'll be able to get round to this in a weeks/2 weeks time after my PR lands, but I think it's still OK to put this in before then, even if we need to rewrite it a bit for pylibcudf later.

python/cudf/cudf/_lib/parquet.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_parquet.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added the pylibcudf Issues specific to the pylibcudf package label Jun 3, 2024
@galipremsagar
Copy link
Contributor Author

@lithomas1 This is now ready for a re-review.

Copy link
Contributor

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now (just one final comment).

I think someone else should probably take a look too, since I'm still pretty new to the codebase.

python/cudf/cudf/tests/test_parquet.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lithomas1 lithomas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jun 6, 2024
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 66895af into rapidsai:branch-24.08 Jun 6, 2024
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Status: Slip
Development

Successfully merging this pull request may close these issues.

None yet

3 participants