Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Move all JSON parsing to the same backend as get_json_object #10804

Open
8 tasks
revans2 opened this issue May 13, 2024 · 0 comments
Open
8 tasks

[FEA] Move all JSON parsing to the same backend as get_json_object #10804

revans2 opened this issue May 13, 2024 · 0 comments
Assignees
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented May 13, 2024

Is your feature request related to a problem? Please describe.
This is an epic intended to get us to a point where all JSON parsing functionality can be enabled by default. This is not intended to be the final long term solution. We really want to have a common JSON parser/tokenizer that is owned and maintained by CUDF. But in order for us to have correctness and at least good enough performance in the short term we are going to go with this approach.

The first thing we need is to establish a baseline in terms of performance so we can be sure that we are not regressing in get_json_object as we make changes to the tokenization to make it more configurable.

As a part of this we also need to finish writing all of the JSON tests we can come up with.

After this we need to do some refactoring to the JSON tokenizer in https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.06/src/main/cpp/src/json_parser.cuh from_json and the json input format are configurable in a number of ways that we need to support. get_json_object and json_tuple are not configurable and the current tokenizer has been hard coded to handle those settings.

Finally we will need to write some custom implementations of different operators so we can hopefully improve the total performance.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify epic Issue that encompasses a significant feature or body of work labels May 13, 2024
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Issue that encompasses a significant feature or body of work feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants