Introduce low shuffle merge. #10786

liurenjie1024 · 2024-05-09T06:57:44Z

Close #10905 .
This pr is the first one to introduces low shuffle merge optimization to speed up merge. Currently we only support databricks 13.3, we will add support more versions once this pr gets merged.

liurenjie1024 · 2024-05-09T09:13:56Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuReadFileFormatWithMetrics.scala

...spark341db/src/main/scala/com/nvidia/spark/rapids/delta/shims/MergeIntoCommandMetaShim.scala

...elta-spark341db/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

...lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaSQLConf.scala

...c/main/spark341db/scala/org/apache/spark/sql/execution/rapids/shims/FilePartitionShims.scala

razajafri

Do we need to add tests? EIther unit or integration tests?

jlowe

Have not finished the review yet, but here are some early comments. Like @razajafri said, there needs to be tests for this along with benchmarking to show the performance vs. baseline CPU and GPU without low shuffle vs GPU with low shuffle in various setups (e.g.: lots of rows updating, very few rows updating, etc.)

...lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaSQLConf.scala

...common/src/main/delta-io/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuRapidsRepartitionByFilePath.scala

...b/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuLowShuffleMergeCommand.scala

liurenjie1024 · 2024-05-10T09:10:13Z

build

liurenjie1024 · 2024-05-10T10:22:07Z

build

liurenjie1024

I'll update the pr by fixing comments and adding some integration tests.

...lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaSQLConf.scala

...common/src/main/delta-io/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuRapidsRepartitionByFilePath.scala

...spark341db/src/main/scala/com/nvidia/spark/rapids/delta/shims/MergeIntoCommandMetaShim.scala

liurenjie1024 · 2024-05-13T08:51:25Z

cc @jlowe @razajafri I've fixed comments and added integrations test, PTAL.

liurenjie1024 · 2024-05-13T10:30:01Z

build

...lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaSQLConf.scala

...common/src/main/delta-io/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

delta-lake/common/src/main/scala/com/nvidia/spark/rapids/delta/DeltaProviderImplBase.scala

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

...b/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuLowShuffleMergeCommand.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuReadFileFormatWithMetrics.scala

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuFileSourceScanExec.scala

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

...c/main/spark341db/scala/org/apache/spark/sql/execution/rapids/shims/FilePartitionShims.scala

liurenjie1024 · 2024-05-14T06:39:25Z

build

liurenjie1024 · 2024-05-14T11:07:04Z

cc @jlowe I've fixed all comments, PTAL

liurenjie1024 · 2024-05-14T13:39:11Z

build

jlowe

Still need performance numbers for various setups and a tracking issue for porting the code to other platforms.

Also note the user documentation for Delta Lake support will need to be updated to describe this new feature after it's merged.

...a-lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaUtils.scala

integration_tests/src/main/python/delta_lake_low_shuffle_merge_test.py

liurenjie1024 · 2024-05-17T10:54:49Z

cc @jlowe I have fixed all tests and it should work now, but with some following issues to resolve:

Implement true row index for other parquet scan modes, which currently only supports PERFILE scan.
Push filename grouping into GpuFileSourceScanExec to remove the limitation of one file per partition.
Add support for all other platforms.

liurenjie1024 · 2024-05-17T11:07:03Z

build

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 · 2024-05-18T09:31:39Z

build

liurenjie1024 · 2024-05-20T03:59:15Z

build

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 · 2024-05-20T07:25:47Z

build

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 · 2024-05-20T09:19:44Z

jenkins/spark-premerge-build.sh

@@ -206,7 +206,7 @@ ci_scala213() {
    cd .. # Run integration tests in the project root dir to leverage test cases and resource files
    export TEST_TAGS="not premerge_ci_1"
    export TEST_TYPE="pre-commit"
-    export TEST_PARALLEL=5
+    export TEST_PARALLEL=4


I think I have hit similar bug here: #8652 I did in my local env that it can pass by changing it to 4.

This doesn't seem like a change that should be made as part of Delta Lake low shuffle merge but rather as a separate PR, especially if you can get it to fail without your low shuffle merge changes. cc: @NvTimLiu for visibility.

Actually I haven't figured out why the failure in #8652 disappeared when I changed it. I think there exists a bug in the array_test.py which is unreleated to my change, but when I add more integration tests the test order changed and it just works. I think we eventually need to fix #8652 , but for this pr we should make this change to workaournd.

What I'm saying above is that this change is not really related to this PR. It's a significant change in CI scripts that will affect performance of CI, since we'll run fewer tests in parallel. That's why I think this should be a separate change, not hidden in a large PR as a side-effect.

The problem is that without this change, the integration tests will fail at array_test.py.

liurenjie1024 · 2024-05-20T09:30:36Z

build

.../src/main/databricks/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormatBase.scala

...common/src/main/delta-io/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuRapidsRepartitionByFilePath.scala

...e/delta-24x/src/main/scala/com/nvidia/spark/rapids/delta/delta24x/MergeIntoCommandMeta.scala

...common/src/main/delta-io/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormat.scala

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

...e/delta-24x/src/main/scala/com/nvidia/spark/rapids/delta/delta24x/MergeIntoCommandMeta.scala

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

liurenjie1024 · 2024-05-21T06:46:39Z

build

liurenjie1024 · 2024-05-21T14:26:52Z

build

...e/delta-24x/src/main/scala/com/nvidia/spark/rapids/delta/delta24x/MergeIntoCommandMeta.scala

...b/src/main/scala/com/databricks/sql/transaction/tahoe/rapids/GpuLowShuffleMergeCommand.scala

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

integration_tests/src/main/python/delta_lake_low_shuffle_merge_test.py

...4x/src/main/scala/org/apache/spark/sql/delta/rapids/delta24x/GpuLowShuffleMergeCommand.scala

…hufflemerge2

liurenjie1024 · 2024-05-22T09:49:05Z

build

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuDeltaParquetFileFormatUtils.scala

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuRapidsRepartitionByFilePath.scala

jlowe · 2024-05-22T16:08:40Z

...ake/common/src/main/scala/com/nvidia/spark/rapids/delta/GpuRapidsRepartitionByFilePath.scala

+          withResource(partitionIdExpr.columnarEval(firstRow)) { gpuCol =>
+            withResource(gpuCol.copyToHost()) { hostCol =>
+              val partitionId = hostCol.getInt(0)


This is a bit wasteful, producing and running a full columnar batch for effectively one scalar. Arguably the expression should be done on the CPU, as it would be faster for computing the single hash value on the file path and would not need to run a separate job, manifesting a full columnar batch of redundant file names. Worth tracking in a followup issue, as this needlessly adds to memory pressure on the GPU.

+1, it could be an improvement in follow up issue.

liurenjie1024 · 2024-05-23T07:56:20Z

build

liurenjie1024 · 2024-05-23T09:17:47Z

build

jlowe

Thanks for all the updates, @liurenjie1024! This is getting close. Would be good to file the followup issues, ideally pointing to them with TODO's in the code. Also need performance numbers as mentioned before.

jlowe · 2024-05-23T13:55:05Z

integration_tests/src/main/python/delta_lake_low_shuffle_merge_test.py

+                " WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *"
+
+    conf = copy_and_update(delta_merge_enabled_conf,
+                           {"spark.rapids.sql.exec.RapidsRepartitionByFilePathExec": "false"})


The test says it's testing when the file scan override fails, but it's not disabling the file scan. Instead it's disabling the custom exec for low shuffle merge which seems very unlikely to happen in practice. Having coverage of that rare occurrence is great, but there's not a test for when the file scan falls back which will be more common. That should be added.

liurenjie1024 · 2024-05-24T12:58:53Z

Thanks for all the updates, @liurenjie1024! This is getting close. Would be good to file the followup issues, ideally pointing to them with TODO's in the code. Also need performance numbers as mentioned before.

Sure, I will do some experiments to measure performance improvements .

liurenjie1024 requested a review from jlowe May 9, 2024 06:57

liurenjie1024 mentioned this pull request May 9, 2024

WIP: Low shuffle merge implementation. #10753

Closed

razajafri changed the title ~~feat: Introduce low shuffle merge.~~ Introduce low shuffle merge. May 9, 2024

razajafri reviewed May 9, 2024

View reviewed changes

sameerz added the performance A performance related task/issue label May 9, 2024

jlowe reviewed May 9, 2024

View reviewed changes

liurenjie1024 force-pushed the renjie/lowshufflemerge2 branch from b72112f to f7b1ab4 Compare May 10, 2024 10:07

liurenjie1024 commented May 11, 2024

View reviewed changes

liurenjie1024 force-pushed the renjie/lowshufflemerge2 branch from f7b1ab4 to d31d5f0 Compare May 13, 2024 08:44

liurenjie1024 requested review from revans2, tgravescs, GaryShen2008, NvTimLiu and gerashegalov as code owners May 13, 2024 08:44

jlowe reviewed May 13, 2024

View reviewed changes

jlowe reviewed May 14, 2024

View reviewed changes

...a-lake/common/src/main/databricks/scala/com/nvidia/spark/rapids/delta/RapidsDeltaUtils.scala Outdated Show resolved Hide resolved

integration_tests/src/main/python/delta_lake_low_shuffle_merge_test.py Outdated Show resolved Hide resolved

liurenjie1024 added 3 commits May 18, 2024 17:03

feat: Introduce low shuffle merge.

335bde4

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

Fix build break

24decb5

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

Fix comments

581ee8f

liurenjie1024 added 3 commits May 18, 2024 17:03

restore tests

8ef391a

Resotore unnecesasary change

9008586

Fix all tests.

a94133e

liurenjie1024 force-pushed the renjie/lowshufflemerge2 branch from 8151cbc to a94133e Compare May 18, 2024 09:03

Revert unnecessary changes

7211dad

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

Try to pass ci

6e096c1

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 commented May 20, 2024

View reviewed changes

jlowe reviewed May 20, 2024

View reviewed changes

liurenjie1024 added 2 commits May 21, 2024 14:02

Fix comments

3e9bdcb

Revert changes

b03ce6a

jlowe reviewed May 21, 2024

View reviewed changes

liurenjie1024 added 2 commits May 22, 2024 10:34

Merge remote-tracking branch 'upstream/branch-24.06' into renjie/lows…

df298fc

…hufflemerge2

Fix memory leak and comments.

40a58b7

jlowe reviewed May 22, 2024

View reviewed changes

liurenjie1024 added 2 commits May 23, 2024 15:38

Fix memory leak.

d89f6c7

Fix comments

ee26949

liurenjie1024 added 2 commits May 23, 2024 17:00

Fix

60c7286

Revert debug

8e48664

jlowe reviewed May 23, 2024

View reviewed changes

liurenjie1024 changed the base branch from branch-24.06 to branch-24.08 May 29, 2024 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce low shuffle merge. #10786

Introduce low shuffle merge. #10786

liurenjie1024 commented May 9, 2024 •

edited

liurenjie1024 commented May 9, 2024

razajafri left a comment

jlowe left a comment

liurenjie1024 commented May 10, 2024

liurenjie1024 commented May 10, 2024

liurenjie1024 left a comment

liurenjie1024 commented May 13, 2024

liurenjie1024 commented May 13, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented May 14, 2024

jlowe left a comment

liurenjie1024 commented May 17, 2024

liurenjie1024 commented May 17, 2024

liurenjie1024 commented May 18, 2024

liurenjie1024 commented May 20, 2024

liurenjie1024 commented May 20, 2024

liurenjie1024 May 20, 2024

jlowe May 20, 2024

liurenjie1024 May 21, 2024

jlowe May 21, 2024

liurenjie1024 May 21, 2024

liurenjie1024 commented May 20, 2024

liurenjie1024 commented May 21, 2024

liurenjie1024 commented May 21, 2024

liurenjie1024 commented May 22, 2024

jlowe May 22, 2024

liurenjie1024 May 23, 2024

liurenjie1024 commented May 23, 2024

liurenjie1024 commented May 23, 2024

jlowe left a comment

jlowe May 23, 2024

liurenjie1024 commented May 24, 2024

Introduce low shuffle merge. #10786

Are you sure you want to change the base?

Introduce low shuffle merge. #10786

Conversation

liurenjie1024 commented May 9, 2024 • edited

liurenjie1024 commented May 9, 2024

razajafri left a comment

Choose a reason for hiding this comment

jlowe left a comment

Choose a reason for hiding this comment

liurenjie1024 commented May 10, 2024

liurenjie1024 commented May 10, 2024

liurenjie1024 left a comment

Choose a reason for hiding this comment

liurenjie1024 commented May 13, 2024

liurenjie1024 commented May 13, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented May 14, 2024

liurenjie1024 commented May 14, 2024

jlowe left a comment

Choose a reason for hiding this comment

liurenjie1024 commented May 17, 2024

liurenjie1024 commented May 17, 2024

liurenjie1024 commented May 18, 2024

liurenjie1024 commented May 20, 2024

liurenjie1024 commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurenjie1024 commented May 20, 2024

liurenjie1024 commented May 21, 2024

liurenjie1024 commented May 21, 2024

liurenjie1024 commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurenjie1024 commented May 23, 2024

liurenjie1024 commented May 23, 2024

jlowe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liurenjie1024 commented May 24, 2024

liurenjie1024 commented May 9, 2024 •

edited