Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] Use the smaller table to build hashmap in shuffled hash join #5750

Merged
merged 4 commits into from
May 31, 2024

Conversation

zml1206
Copy link
Contributor

@zml1206 zml1206 commented May 15, 2024

What changes were proposed in this pull request?

Background history: SPARK-36612 support left outer join build left or right outer join build right in shuffled hash join, it first took effect in spark 3.5.
#408 support use the smaller table to build hashmap in shuffled hash join in velox backend, and it overwrite HashJoin, so when fallback to vanilla spark all versions support, not just spark 3.5. But since the class loading order is not always predictable in Java 8 & HotSpot JVM, Support has been temporarily removed in #674.
This PR finally select build side when genShuffledHashJoinExecTransformer to support use the smaller table to build hashmap in shuffled hash join in velox backend.
JoinSelectionOverrides will no longer generate illegal build side, causing EnsureRequirements verification to fail.

How was this patch tested?

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as draft May 15, 2024 03:44
Copy link

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as ready for review May 15, 2024 03:59
@zml1206 zml1206 marked this pull request as draft May 15, 2024 05:37
Copy link

Run Gluten Clickhouse CI

4 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as ready for review May 15, 2024 22:19
@zml1206
Copy link
Contributor Author

zml1206 commented May 16, 2024

Test failure is unrelated. cc @rui-mo @zhztheplayer

@rui-mo rui-mo requested a review from zhztheplayer May 16, 2024 05:59
@rui-mo
Copy link
Contributor

rui-mo commented May 16, 2024

/Benchmark Velox

@rui-mo
Copy link
Contributor

rui-mo commented May 16, 2024

/Benchmark Velox TPCDS

@zml1206 zml1206 force-pushed the optimize_shuffle_hash_join branch from fe60af1 to 8182c17 Compare May 21, 2024 07:46
Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

Background history: SPARK-36612 support left outer join build left or right outer join build right in shuffled hash join, it first took effect in spark 3.5.

@zml1206

Then how does this PR work with lower Spark version?

@zml1206
Copy link
Contributor Author

zml1206 commented May 23, 2024

Background history: SPARK-36612 support left outer join build left or right outer join build right in shuffled hash join, it first took effect in spark 3.5.

@zml1206

Then how does this PR work with lower Spark version?

If there is no fallback, all versions of velox bakcend support using small table to build hashmap. If there is a fallback, it will remain consistent with vanilla spark, it is supported after 3.5 and not supported before 3.5. @zhztheplayer

@zhztheplayer
Copy link
Member

Background history: SPARK-36612 support left outer join build left or right outer join build right in shuffled hash join, it first took effect in spark 3.5.

@zml1206
Then how does this PR work with lower Spark version?

If there is no fallback, all versions of velox bakcend support using small table to build hashmap. If there is a fallback, it will remain consistent with vanilla spark, it is supported after 3.5 and not supported before 3.5. @zhztheplayer

Does that mean we don't actually need strategy JoinSelectionOverrides to alter join build side on vanilla query plan ? Can we completely remove that strategy?

zhztheplayer
zhztheplayer previously approved these changes May 24, 2024
Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhztheplayer
Copy link
Member

/Benchmark Velox

@zhztheplayer
Copy link
Member

/Benchmark Velox TPCDS

@WangGuangxin
Copy link
Contributor

Do we still need preferredBuildSide in ShuffledHashJoinExecTransformer? https://github.com/apache/incubator-gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/execution/ShuffledHashJoinExecTransformer.scala#L78

In fact, I'm quite confused this logic here. In fact, if we can choose the right build side, why prefer new build side based on pattern there?

@rui-mo
Copy link
Contributor

rui-mo commented May 24, 2024

if we can choose the right build side, why prefer new build side based on pattern there?

@WangGuangxin The reasoning behind this is that, in cases where the child is an aggregation or filter, we discovered that Spark's smaller table selection was inconsistent with reality. If we follow Spark's recommendation in this case, we will actually be using the larger table as the build table, which will result in performance regression. But with this new modification, perhaps we can test the performance and see if we still need this hint.

@zml1206
Copy link
Contributor Author

zml1206 commented May 24, 2024

Background history: SPARK-36612 support left outer join build left or right outer join build right in shuffled hash join, it first took effect in spark 3.5.

@zml1206
Then how does this PR work with lower Spark version?

If there is no fallback, all versions of velox bakcend support using small table to build hashmap. If there is a fallback, it will remain consistent with vanilla spark, it is supported after 3.5 and not supported before 3.5. @zhztheplayer

Does that mean we don't actually need strategy JoinSelectionOverrides to alter join build side on vanilla query plan ? Can we completely remove that strategy?

First of all, the judgment of JoinSelectionOverrides and Spark Join Selection are not the same. I am not sure why JoinSelectionOverrides does not have the judgment of size. Secondly, clickhouse backend has special processing.

@zhztheplayer
Copy link
Member

I am not sure why JoinSelectionOverrides does not have the judgment of size.

@rui-mo Do you have some contexts on this difference? If Velox backend doesn't require for the strategy then we may be able to move it to CH module.

@WangGuangxin
Copy link
Contributor

I am not sure why JoinSelectionOverrides does not have the judgment of size.

@rui-mo Do you have some contexts on this difference? If Velox backend doesn't require for the strategy then we may be able to move it to CH module.

IIUC, JoinSelectionOverride does two things:

  1. force shuffled hash join
  2. fallback if there are continue joins ( TPC-DS badcase?)

Both of them can be ( and should be ?) moved to ColumnOverrides.

In fact, if we reuse the JoinSelection rule in Spark instead of override it, it can naturally support choose build side by size, right? cc @zml1206 @rui-mo

@rui-mo
Copy link
Contributor

rui-mo commented May 24, 2024

it can naturally support choose build side by size, right?

@WangGuangxin A deeper issue might be the size estimation could be inaccurate if there is an aggregate or filter, e.g. in some case, the aggregated data has much less rows than its input and becomes the smaller table, but Spark still treats it as the larger table.

@zml1206 zml1206 force-pushed the optimize_shuffle_hash_join branch from 8182c17 to 5671203 Compare May 27, 2024 05:34
Copy link

Run Gluten Clickhouse CI

@zml1206
Copy link
Contributor Author

zml1206 commented May 27, 2024

@zml1206 Would you like to rebase the code? I believe benchmark tool is online now. Thanks!

Of course, done.

@zml1206
Copy link
Contributor Author

zml1206 commented May 27, 2024

/Benchmark Velox

@zml1206
Copy link
Contributor Author

zml1206 commented May 27, 2024

/Benchmark Velox TPCDS

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5750_time.csv log/native_master_05_27_2024_56c5a24c1_time.csv difference percentage
q1 34.19 33.83 -0.365 98.93%
q2 23.77 23.77 -0.002 99.99%
q3 38.08 36.69 -1.391 96.35%
q4 25.70 35.09 9.399 136.58%
q5 69.05 68.27 -0.784 98.86%
q6 7.76 5.98 -1.784 77.02%
q7 80.83 78.74 -2.090 97.41%
q8 85.23 83.78 -1.444 98.31%
q9 121.51 119.36 -2.152 98.23%
q10 45.08 45.89 0.809 101.79%
q11 21.00 19.70 -1.302 93.80%
q12 26.79 27.26 0.472 101.76%
q13 39.00 52.73 13.729 135.20%
q14 19.19 17.65 -1.538 91.99%
q15 32.00 31.35 -0.649 97.97%
q16 14.02 14.31 0.284 102.03%
q17 103.42 102.68 -0.732 99.29%
q18 146.00 145.45 -0.555 99.62%
q19 13.59 15.24 1.655 112.18%
q20 29.55 27.93 -1.621 94.51%
q21 264.41 261.25 -3.162 98.80%
q22 13.69 13.57 -0.124 99.10%
total 1253.86 1260.51 6.654 100.53%

zhztheplayer
zhztheplayer previously approved these changes May 27, 2024
@rui-mo
Copy link
Contributor

rui-mo commented May 28, 2024

We notice large regression for q14a/q14b on TPC-DS benchmark. We may need to double-check. Thanks.

query	log/native_5750_time.csv	log/native_master_05_26_2024_time.csv	difference
q1	8.54	9.57	1.033
q2	8.81	8.39	-0.424
q3	2.76	2.91	0.149
q4	47.48	47.51	0.037
q5	5.23	6.27	1.046
q6	1.51	2.55	1.035
q7	4.40	4.19	-0.215
q8	2.37	2.59	0.220
q9	15.51	15.36	-0.150
q10	6.73	7.07	0.345
q11	24.12	23.95	-0.169
q12	1.86	0.98	-0.878
q13	4.04	4.01	-0.028
q14a	385.94	33.16	-352.776
q14b	375.88	30.84	-345.040

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found performance regression for q14a/q14b. Would you take a look?

@zml1206
Copy link
Contributor Author

zml1206 commented May 28, 2024

We found performance regression for q14a/q14b. Would you take a look?

I will take a look later.

Copy link

Run Gluten Clickhouse CI

@zml1206
Copy link
Contributor Author

zml1206 commented May 28, 2024

/Benchmark Velox TPCDS

Copy link

Run Gluten Clickhouse CI

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_5750_time.csv log/native_master_05_27_2024_efd6f31fb4_time.csv difference percentage
q1 14.80 15.37 0.568 103.83%
q2 15.07 15.78 0.707 104.69%
q3 4.90 4.39 -0.510 89.60%
q4 65.93 64.02 -1.903 97.11%
q5 7.88 8.19 0.301 103.81%
q6 2.55 2.67 0.120 104.71%
q7 5.67 6.21 0.535 109.42%
q8 3.12 3.35 0.234 107.51%
q9 16.32 18.08 1.761 110.79%
q10 9.54 10.51 0.967 110.13%
q11 35.17 35.26 0.091 100.26%
q12 1.46 1.51 0.046 103.17%
q13 6.77 5.60 -1.171 82.71%
q14a 41.28 44.21 2.927 107.09%
q14b 42.14 41.63 -0.510 98.79%
q15 2.48 3.81 1.330 153.63%
q16 39.56 39.72 0.156 100.40%
q17 5.57 5.72 0.146 102.61%
q18 6.25 9.08 2.823 145.14%
q19 3.85 3.49 -0.362 90.59%
q20 1.34 1.45 0.105 107.86%
q21 1.14 1.20 0.060 105.28%
q22 8.34 7.98 -0.364 95.63%
q23a 79.77 81.64 1.879 102.36%
q23b 103.35 99.94 -3.406 96.70%
q24a 84.33 72.11 -12.229 85.50%
q24b 72.45 79.45 6.994 109.65%
q25 7.44 4.56 -2.881 61.28%
q26 2.68 10.31 7.628 384.31%
q27 3.06 2.66 -0.396 87.05%
q28 20.09 22.02 1.935 109.63%
q29 7.85 7.00 -0.850 89.17%
q30 4.09 4.12 0.021 100.51%
q31 6.17 5.95 -0.213 96.55%
q32 1.01 1.24 0.231 122.88%
q33 4.66 4.75 0.086 101.85%
q34 4.82 4.66 -0.160 96.68%
q35 6.30 7.93 1.629 125.87%
q36 3.11 3.18 0.069 102.23%
q37 3.82 3.93 0.114 102.99%
q38 11.76 11.84 0.081 100.69%
q39a 3.20 3.35 0.146 104.56%
q39b 2.91 2.79 -0.113 96.10%
q40 3.93 3.83 -0.099 97.47%
q41 0.57 0.56 -0.010 98.17%
q42 2.22 0.94 -1.275 42.56%
q43 3.51 3.55 0.039 101.12%
q44 7.83 7.08 -0.745 90.48%
q45 3.33 3.51 0.172 105.14%
q46 3.37 3.03 -0.345 89.76%
q47 17.91 14.84 -3.071 82.85%
q48 4.49 4.40 -0.096 97.87%
q49 11.31 7.41 -3.898 65.52%
q50 22.50 24.44 1.940 108.62%
q51 8.69 8.72 0.040 100.46%
q52 1.00 0.99 -0.015 98.55%
q53 1.73 1.79 0.056 103.24%
q54 3.09 3.30 0.214 106.91%
q55 0.96 0.94 -0.016 98.32%
q56 6.92 4.38 -2.532 63.39%
q57 8.54 8.52 -0.025 99.71%
q58 2.56 2.71 0.151 105.88%
q59 14.24 16.41 2.162 115.18%
q60 4.77 5.10 0.338 107.10%
q61 5.49 5.28 -0.216 96.07%
q62 3.94 4.25 0.315 107.99%
q63 1.80 1.85 0.052 102.89%
q64 49.21 49.92 0.712 101.45%
q65 13.65 13.56 -0.085 99.37%
q66 2.98 3.01 0.036 101.20%
q67 354.30 354.16 -0.137 99.96%
q68 4.72 3.50 -1.222 74.13%
q69 6.67 6.78 0.105 101.58%
q70 8.03 8.76 0.728 109.07%
q71 2.26 2.27 0.007 100.30%
q72 187.99 191.86 3.872 102.06%
q73 2.03 2.29 0.258 112.70%
q74 21.27 21.60 0.324 101.52%
q75 24.25 23.74 -0.511 97.89%
q76 7.44 7.03 -0.407 94.53%
q77 1.89 1.78 -0.106 94.40%
q78 37.38 39.90 2.514 106.72%
q79 4.63 3.47 -1.163 74.90%
q80 10.48 10.58 0.100 100.95%
q81 4.50 4.53 0.037 100.82%
q82 6.40 8.65 2.250 135.17%
q83 1.39 1.46 0.066 104.77%
q84 2.75 2.93 0.185 106.73%
q85 6.56 7.13 0.572 108.72%
q86 3.03 3.19 0.164 105.39%
q87 12.25 12.28 0.036 100.29%
q88 19.07 16.77 -2.292 87.98%
q89 2.74 2.65 -0.098 96.43%
q90 3.14 3.40 0.263 108.37%
q91 2.50 2.57 0.077 103.10%
q92 1.10 1.14 0.041 103.76%
q93 29.91 31.14 1.231 104.12%
q94 21.72 21.35 -0.372 98.29%
q9 81.46 81.54 0.082 100.10%
q5 2.36 6.35 3.984 268.55%
q96 11.89 11.98 0.091 100.77%
q97 2.03 1.82 -0.210 89.69%
q98 8.91 9.80 0.892 110.01%
q99 8.91 9.80 0.892 110.01%
total 1881.63 1895.41 13.783 100.73%

@zml1206 zml1206 requested a review from rui-mo May 28, 2024 11:59
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5750_time.csv log/native_master_05_27_2024_efd6f31fb_time.csv difference percentage
q1 33.80 34.01 0.219 100.65%
q2 23.88 25.69 1.814 107.60%
q3 37.29 36.59 -0.700 98.12%
q4 32.86 34.28 1.421 104.32%
q5 69.20 70.33 1.124 101.62%
q6 5.80 7.60 1.802 131.08%
q7 80.38 82.15 1.771 102.20%
q8 87.50 88.29 0.790 100.90%
q9 126.25 122.17 -4.083 96.77%
q10 47.02 46.70 -0.317 99.32%
q11 20.46 20.07 -0.390 98.09%
q12 28.99 23.62 -5.369 81.48%
q13 36.71 53.35 16.645 145.35%
q14 20.69 17.13 -3.559 82.80%
q15 30.35 31.65 1.295 104.27%
q16 13.64 13.86 0.222 101.63%
q17 103.32 103.47 0.149 100.14%
q18 146.19 144.12 -2.078 98.58%
q19 17.35 13.72 -3.630 79.07%
q20 27.85 28.10 0.247 100.89%
q21 261.35 260.32 -1.023 99.61%
q22 13.86 12.53 -1.337 90.36%
total 1264.75 1269.76 5.011 100.40%

// condition.
// case LeftOuter | LeftSemi => true
// LeftOuter.
case LeftOuter => true
Copy link
Contributor

@rui-mo rui-mo May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build left is supported by LeftSemi on Velox backend. For a left semi join, I assume we need to build left is the left table is smaller, so shall we add it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous regression of b14a/b14b was due to left semi join build left side. Although build left is supported by LeftSemi on Velox backend, but the performance is not as good as build right, vanilla spark also choose build right.

Copy link
Contributor

@rui-mo rui-mo May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns if building left is supported, which I assume indicates the backend's capability and shouldn't be mixed up with the build side selection.

vanilla spark also choose build right.

It is due to some special logic, e.g. preferredBuildSide, Gluten chooses to build left? If so, I think we need to fix there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to remove preferredBuildSide in the next PR because it is unnecessary and not as accurate as AQE. Like vanilla spark, its performance improvement is compared to sort merge join, for left outer and join outer, if both are shuffle hash joins, choosing a small table will not improve performance.
I don’t know the logic of velox’s support for small table build. Shuffle hash join is forced to be used here,
what I'm worried about is that for shuffle hash join of LeftSemi, build left is much slower than build right.

Copy link
Contributor Author

@zml1206 zml1206 May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/spark/blob/branch-3.2/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a/explain.txt
Both left and right of LeftSemi contain filter, so isPreferred(left) and isPreferred(right) both are true, so I think it is not caused by preferredBuildSide, but because build left is much slower than build right in LeftSemi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. The two sides of the time-consuming join have a large difference on data size, so we need to make sure the smaller table is used as build side. I'm still confirming that.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO here to describe why we don't add LeftSemi? Maybe also include the issue link.
facebookincubator/velox#9980

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@rui-mo rui-mo merged commit f48b9fa into apache:main May 31, 2024
40 checks passed
@zml1206
Copy link
Contributor Author

zml1206 commented May 31, 2024

Thanks all for review.

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_5750_time.csv log/native_master_05_30_2024_73eb21db45_time.csv difference percentage
q1 34.11 34.62 0.511 101.50%
q2 24.35 23.80 -0.555 97.72%
q3 36.72 37.04 0.315 100.86%
q4 33.09 32.22 -0.872 97.36%
q5 69.13 70.41 1.279 101.85%
q6 5.80 7.25 1.447 124.94%
q7 81.27 81.31 0.045 100.06%
q8 85.09 85.87 0.776 100.91%
q9 122.31 118.14 -4.176 96.59%
q10 48.39 45.74 -2.648 94.53%
q11 20.27 21.64 1.376 106.79%
q12 23.62 22.83 -0.797 96.63%
q13 37.59 52.00 14.408 138.33%
q14 22.65 18.87 -3.788 83.28%
q15 30.21 32.69 2.477 108.20%
q16 13.89 13.77 -0.128 99.08%
q17 102.09 101.66 -0.435 99.57%
q18 144.76 143.98 -0.782 99.46%
q19 16.70 14.76 -1.939 88.38%
q20 27.34 28.58 1.246 104.56%
q21 257.27 265.55 8.283 103.22%
q22 12.02 14.49 2.475 120.60%
total 1248.68 1267.19 18.517 101.48%

@zml1206
Copy link
Contributor Author

zml1206 commented Jun 3, 2024

If the custom strategy can be removed by moving the code to ColumnarOverrides (without more workarounds), Personally I will be inclined to do that since it:

  1. Simplifies code
  2. Creates "more vanilla" plan when the join operators are falling back

Do it in this PR or next? @zhztheplayer

Thank you for willing to take this. Of course it can be done in a separate PR.

I looked it up and it seems like can't move custom strategy to ColumnarOverrides. Custom strategy is used to transform a LogicalPlan into a SparkPlan. Columnar rule implements Spark plan into Gluten plan. Custom strategy runs before Columnar rule. @zhztheplayer

@zhztheplayer
Copy link
Member

If the custom strategy can be removed by moving the code to ColumnarOverrides (without more workarounds), Personally I will be inclined to do that since it:

  1. Simplifies code
  2. Creates "more vanilla" plan when the join operators are falling back

Do it in this PR or next? @zhztheplayer

Thank you for willing to take this. Of course it can be done in a separate PR.

I looked it up and it seems like can't move custom strategy to ColumnarOverrides. Custom strategy is used to transform a LogicalPlan into a SparkPlan. Columnar rule implements Spark plan into Gluten plan. Custom strategy runs before Columnar rule. @zhztheplayer

I think SHJ and SMJ in Spark share the same requiredChildDistribution code. Which means we can just convert one to another in columnar rule then adjust its children to add/remove sorts on top of them. (Ideally, you could double check the relevant code)

In the simplest case the topic could be very similar to the one about hash/sort agg, see this code

@zml1206
Copy link
Contributor Author

zml1206 commented Jun 5, 2024

If the custom strategy can be removed by moving the code to ColumnarOverrides (without more workarounds), Personally I will be inclined to do that since it:

  1. Simplifies code
  2. Creates "more vanilla" plan when the join operators are falling back

Do it in this PR or next? @zhztheplayer

Thank you for willing to take this. Of course it can be done in a separate PR.

I looked it up and it seems like can't move custom strategy to ColumnarOverrides. Custom strategy is used to transform a LogicalPlan into a SparkPlan. Columnar rule implements Spark plan into Gluten plan. Custom strategy runs before Columnar rule. @zhztheplayer

I think SHJ and SMJ in Spark share the same requiredChildDistribution code. Which means we can just convert one to another in columnar rule then adjust its children to add/remove sorts on top of them. (Ideally, you could double check the relevant code)

In the simplest case the topic could be very similar to the one about hash/sort agg, see this code

This is a good way for me, thank you, I'll do it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants