Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

caican00
Copy link
Contributor

@caican00 caican00 commented May 4, 2024

What changes were proposed in this pull request?

Support SparkSQL extended syntax in Iceberg, such as:

addPartitionField
dropPartitionField
replacePartitionField
setWriteDistributionAndOrdering
setIdentifierFields
dropIdentifierFields
createOrReplaceBranch
createOrReplaceTag
dropBranch
dropTag

Why are the changes needed?

Support SparkSQL extended syntax in Iceberg.

Fix: #3187

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New ITs.

@caican00 caican00 marked this pull request as draft May 4, 2024 15:22
@caican00 caican00 force-pushed the iceberg-extended-sql branch 2 times, most recently from 49baf5c to 8cba728 Compare May 5, 2024 02:38
@caican00 caican00 marked this pull request as ready for review May 16, 2024 08:29
@caican00 caican00 marked this pull request as draft May 16, 2024 12:59
@caican00 caican00 marked this pull request as ready for review May 17, 2024 07:53
import scala.collection.JavaConverters;
import scala.collection.Seq;

public class IcebergExtendedDataSourceV2Strategy extends ExtendedDataSourceV2Strategy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some logical plans, such as AddPartitionField, CreateOrReplaceBranch, etc, explicitly use SparkCatalog to identify whether it is an Iceberg operation in ExtendedDataSourceV2Strategy.
And the initialize method of SparkCatalog is final, so we can not extend SparkCatalog to support extended sqls.
Therefore, i decided to rewrite ExtendedDataSourceV2Strategy rule to support the Iceberg extended sqls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caican00
Copy link
Contributor Author

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

}

@Override
public Seq<SparkPlan> apply(LogicalPlan plan) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark3.3, 3.4 and 3.5 in Iceberg both support these logical plans, so we don't need to do multiple versions of the implementation.

@FANNG1
Copy link
Contributor

FANNG1 commented May 23, 2024

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

ok, but I will take some time to review this PR, and my main concern about this PR is the multi Spark version and Iceberg version compatibility.

@caican00
Copy link
Contributor Author

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

ok, but I will take some time to review this PR, and my main concern about this PR is the multi Spark version compatibility.

ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support SparkSQL extended syntax in Iceberg
2 participants