Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48329][SQL] Enable spark.sql.sources.v2.bucketing.pushPartValues.enabled by default #46673

Closed
wants to merge 3 commits into from

Conversation

szehon-ho
Copy link
Contributor

@szehon-ho szehon-ho commented May 20, 2024

What changes were proposed in this pull request?

This PR aims to enable spark.sql.sources.v2.bucketing.pushPartValues.enabled by default for Apache Spark 4.0.0 while keeping spark.sql.sources.v2.bucketing.enabled is false.

Why are the changes needed?

spark.sql.sources.v2.bucketing.pushPartValues.enabled was added at Apache Spark 3.4.0 and has been used as one of the datasource v2 bucketing feature. This PR will help the datasource v2 bucketing users use this feature more easily.

Note that this change is technically no-op for the default users because spark.sql.sources.v2.bucketing.enabled is false still.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AS-IS PR title seems to be partial because this PR changes two configurations.

[SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true

Also, please resolve the conflicts and revert the test coverage change, @szehon-ho .

@dongjoon-hyun
Copy link
Member

cc @viirya , @sunchao

@szehon-ho
Copy link
Contributor Author

Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reverted the additional config change and test case, will check the test result.

@superdiaodiao
Copy link
Contributor

Thanks @dongjoon-hyun sorry the pr was not ready. I was trying to integrate the changes from @superdiaodiao who I asaw also made a pr for the same, so we can be co-authors. Reverted the additional config change and test case, will check the test result.

I will close my PR and you can continue, let's co-author this time.

@szehon-ho
Copy link
Contributor Author

Thanks! @dongjoon-hyun @sunchao can you take another look?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-48329][SQL] SPJ: Default spark.sql.sources.v2.bucketing.pushPartValues.enabled to true [SPARK-48329][SQL] Enable spark.sql.sources.v2.bucketing.pushPartValues.enabled by default May 21, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @szehon-ho and @superdiaodiao

Merged to master for Apache Spark 4.0.0.

@superdiaodiao
Copy link
Contributor

+1, LGTM. Thank you, @szehon-ho and @superdiaodiao

Merged to master for Apache Spark 4.0.0.

Thank you~~~
We can't wait to see the release of 4.0.0.

@dongjoon-hyun
Copy link
Member

BTW, @szehon-ho and @superdiaodiao .

The umbrella JIRA is closed one already for Apache Spark 3.3.0. I moved SPARK-48329 to a subtask of SPARK-44111 for now.

Screenshot 2024-05-21 at 10 02 39

For the other new tasks, please create a new umbrella JIRA and use it.

@superdiaodiao
Copy link
Contributor

BTW, @szehon-ho and @superdiaodiao .

The umbrella JIRA is closed one already for Apache Spark 3.3.0. I moved SPARK-48329 to a subtask of SPARK-44111 for now.

Screenshot 2024-05-21 at 10 02 39

For the other new tasks, please create a new umbrella JIRA and use it.

OK

@szehon-ho
Copy link
Contributor Author

@dongjoon-hyun do you have any guidance how we may have a new Spark 4.0+ tracking JIRA to link all the new SPJ items? I think it will be nice to have all of them in one list.

@dongjoon-hyun
Copy link
Member

Feel free to create a new umbrella Jira issue with a proper meaningful title instead of duplicating old Jira issue title. Then, link it to SPARK-44111, @szehon-ho . That's enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants