Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/add predicate hints and json predicate hints parameters #295

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ognis1205
Copy link

@ognis1205 ognis1205 commented Apr 27, 2023

At the moment load_as_pandas does not support SQL expressions for Filtering and JSON Predicates for Filtering.
This PR expands load_as_pandas to support those parameters.

  • Adds predicateHints and jsonPredicateHints parameters to load_as_pandas.

Regarding the load_as_spark, I didn't mention it in the previous discussion as I thought it could be treated as a separate feature request. However, I can work on it too if needed.

closes #292

Signed-off-by: Shingo OKAWA <shingo.okawa.g.h.c@gmail.com>
Signed-off-by: Shingo OKAWA <shingo.okawa.g.h.c@gmail.com>
@ognis1205
Copy link
Author

I will fix these errors.

Signed-off-by: Shingo OKAWA <shingo.okawa.g.h.c@gmail.com>
Signed-off-by: Shingo OKAWA <shingo.okawa.g.h.c@gmail.com>
pd.testing.assert_frame_equal(pdf, expected)


@pytest.mark.skipif(not ENABLE_INTEGRATION, reason=SKIP_MESSAGE)
@pytest.mark.parametrize(
"fragments,version,timestamp,error",
"fragments,jsonPredicateHints,predicateHints,version,timestamp,error",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a unit test that specifies these predicate hints and verifies that they are used correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chakankardb
Thank you for the comment, and yes, I have to add an unit test to check if these new parameters are used correctly. I will ping you again when I add the unit test.

@ognis1205
Copy link
Author

Hi @chakankardb,

Would you mind if I requested you to share the table schemas (e.g., the available columns in the test data) for implementing the unit test? From what I understand, the configurations for the test server can be found at:

https://github.com/delta-io/delta-sharing/tree/main/server/src/test/scala/io/delta/sharing/server

However, I couldn't find any explicit definition for the relating Delta tables.

@Meehau
Copy link

Meehau commented Apr 16, 2024

Hello everyone - is there anything else blocking this feature? What needs to be done to move forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add predicateHints and jsonPredicateHints parameters to load_as_pandas methods
3 participants