[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

caican00 · 2024-05-04T15:21:56Z

What changes were proposed in this pull request?

Support SparkSQL extended syntax in Iceberg, such as:

addPartitionField
dropPartitionField
replacePartitionField
setWriteDistributionAndOrdering
setIdentifierFields
dropIdentifierFields
createOrReplaceBranch
createOrReplaceTag
dropBranch
dropTag

Why are the changes needed?

Support SparkSQL extended syntax in Iceberg.

Fix: #3187

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New ITs.

… to iceberg Table

…in SQL queries

…ntax in Iceberg

…xtended-sql

…berg-extended-sql

…xtended-sql

caican00 · 2024-05-21T11:40:17Z

...strato/gravitino/spark/connector/iceberg/extensions/IcebergExtendedDataSourceV2Strategy.java

+import scala.collection.JavaConverters;
+import scala.collection.Seq;
+
+public class IcebergExtendedDataSourceV2Strategy extends ExtendedDataSourceV2Strategy {


some logical plans, such as AddPartitionField, CreateOrReplaceBranch, etc, explicitly use SparkCatalog to identify whether it is an Iceberg operation in ExtendedDataSourceV2Strategy.
And the initialize method of SparkCatalog is final, so we can not extend SparkCatalog to support extended sqls.
Therefore, i decided to rewrite ExtendedDataSourceV2Strategy rule to support the Iceberg extended sqls.

https://github.com/apache/iceberg/blob/8d6bee736884575da7368e0963268d1cbe362d90/spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala#L197
cc: @FANNG1

caican00 · 2024-05-23T02:09:32Z

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

caican00 · 2024-05-23T02:18:34Z

...strato/gravitino/spark/connector/iceberg/extensions/IcebergExtendedDataSourceV2Strategy.java

+  }
+
+  @Override
+  public Seq<SparkPlan> apply(LogicalPlan plan) {


spark3.3, 3.4 and 3.5 in Iceberg both support these logical plans, so we don't need to do multiple versions of the implementation.

FANNG1 · 2024-05-23T02:36:33Z

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

ok, but I will take some time to review this PR, and my main concern about this PR is the multi Spark version and Iceberg version compatibility.

caican00 · 2024-05-23T02:41:08Z

Hi @FANNG1 could you please help review this pr first if you are free? I am trying to find a solution to fix the issue of metadata tables.

ok, but I will take some time to review this PR, and my main concern about this PR is the multi Spark version compatibility.

ok.

[datastrato#2543] feat(spark-connector): support row-level operations…

4d334aa

… to iceberg Table

caican00 marked this pull request as draft May 4, 2024 15:22

[datastrato#3264] feat(spark-connector): Support Iceberg time travel …

90b7be8

…in SQL queries

caican00 force-pushed the iceberg-extended-sql branch 2 times, most recently from 49baf5c to 8cba728 Compare May 5, 2024 02:38

update

65ef2a4

caican00 force-pushed the iceberg-extended-sql branch from 8cba728 to 0826647 Compare May 5, 2024 15:48

[datastrato#3187] feat(spark-connector): Support SparkSQL extended sy…

0a8ff35

…ntax in Iceberg

caican00 force-pushed the iceberg-extended-sql branch from 0826647 to 0a8ff35 Compare May 5, 2024 16:39

caican00 and others added 7 commits May 13, 2024 17:59

Merge branch 'main' of github.com:datastrato/gravitino into iceberg-asof

302244b

update

90b8d14

Merge branch 'main' into iceberg-asof

86f51cd

Merge branch 'main' of github.com:datastrato/gravitino into iceberg-e…

d2ba387

…xtended-sql

update

b367b65

Merge branch 'iceberg-asof' of github.com:caican00/gravitino into ice…

ad1a52e

…berg-extended-sql

update

a8a4d6b

caican00 marked this pull request as ready for review May 16, 2024 08:29

Merge branch 'main' into iceberg-extended-sql

2de6eaf

caican00 marked this pull request as draft May 16, 2024 12:59

caican00 added 3 commits May 17, 2024 11:36

update

ecc463e

Merge branch 'main' of github.com:datastrato/gravitino into iceberg-e…

53a8c8d

…xtended-sql

update

30fba3c

caican00 marked this pull request as ready for review May 17, 2024 07:53

update

b97c325

caican00 force-pushed the iceberg-extended-sql branch from f5ea86a to b97c325 Compare May 17, 2024 10:13

caican00 mentioned this pull request May 21, 2024

[Subtask] [spark-connector] support Iceberg catalog #1571

Open

caican00 added 3 commits May 21, 2024 19:25

Merge branch 'main' of github.com:datastrato/gravitino into iceberg-e…

80b4c99

…xtended-sql

update

560d6ad

update

b87d9f0

caican00 commented May 21, 2024

View reviewed changes

Merge branch 'main' into iceberg-extended-sql

1468ee8

caican00 force-pushed the iceberg-extended-sql branch from bfa8a50 to e4e5dd2 Compare May 23, 2024 02:04

caican00 commented May 23, 2024

View reviewed changes

update

ecceee9

caican00 force-pushed the iceberg-extended-sql branch from e4e5dd2 to ecceee9 Compare May 23, 2024 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

caican00 commented May 4, 2024

caican00 May 21, 2024

caican00 May 21, 2024

caican00 commented May 23, 2024

caican00 May 23, 2024

FANNG1 commented May 23, 2024 •

edited

caican00 commented May 23, 2024

[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

Are you sure you want to change the base?

[#3187] feat(spark-connector): Support SparkSQL extended syntax in Iceberg #3266

Conversation

caican00 commented May 4, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

caican00 May 21, 2024

Choose a reason for hiding this comment

caican00 May 21, 2024

Choose a reason for hiding this comment

caican00 commented May 23, 2024

caican00 May 23, 2024

Choose a reason for hiding this comment

FANNG1 commented May 23, 2024 • edited

caican00 commented May 23, 2024

FANNG1 commented May 23, 2024 •

edited