Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5668][CH] Support mixed conditions in shuffle hash join #5735

Merged
merged 4 commits into from
Jun 3, 2024

Conversation

lgbo-ustc
Copy link
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

Fixes: #5668

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

#5668

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
@baibaichen
Copy link
Contributor

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@baibaichen
Copy link
Contributor

因为 BHJ 右表构建的逻辑改变了,测试结果如下:

baseline 对照组 5735
job_2190 [spark.sql.autoBroadcastJoinThreshold=20MB]
job_2202 [spark.sql.autoBroadcastJoinThreshold=100MB] job_2203[spark.sql.autoBroadcastJoinThreshold=100MB]
job_2206 [spark.sql.autoBroadcastJoinThreshold=100MB] job_2207[spark.sql.autoBroadcastJoinThreshold=100MB]

spark.sql.autoBroadcastJoinThreshold 影响比较大的查询是 q8,q9,q17,q18,我们看到 q8,q9,q17 都有 6% 的回退,结果如下:

name job_2207 job_2203 job_2206 job_2202 job_2206-job2207 job_2202-job2203 job_2190 job2190- job_2202
q01 3634 3682 3689 3653 55 -29 3574 -79
q02 3180 3139 2904 2870 -276 -269 2313 -557
q03 5264 5386 5540 5399 276 13 5201 -198
q04 3295 3338 3438 3412 143 74 3262 -150
q05 8737 9065 8799 8828 62 -237 8595 -233
q06 998 1027 1060 1032 62 5 999 -33
q07 9763 9958 9686 9856 -77 -102 9592 -264
q08 4717 4840 4504 4539 -213 -301 8821 4282
q09 9230 9363 8824 8882 -406 -481 12956 4074
q10 5604 5713 5834 5663 230 -50 5638 -25
q11 1776 1792 1618 1606 -158 -186 1620 14
q12 3932 4041 4056 3991 124 -50 3882 -109
q13 3705 3671 3674 3660 -31 -11 3612 -48
q14 1997 2128 2091 2085 94 -43 1987 -98
q15 4071 4204 3780 3753 -291 -451 3239 -514
q16 5403 5661 5572 5330 169 -331 5363 33
q17 8725 9143 8662 8754 -63 -389 10985 2231
q18 11715 12066 12132 12137 417 71 14916 2779
q19 3788 4005 3915 3985 127 -20 3813 -172
q20 3223 3239 2984 2877 -239 -362 3258 381
q21 28 28 28 25 0 -3 26 1
q22 1112 1157 1146 1111 34 -46 1103 -8

Copy link

Run Gluten Clickhouse CI

@liuneng1994
Copy link
Contributor

Gluten Perf Test Pass, GlutenWithCHStandard tpch-data-sf100, mean-total 118264, min-total 115868.
最新的性能测试没有回退

@lgbo-ustc
Copy link
Contributor Author

lgbo-ustc commented May 28, 2024

分析了下StorageJoinFromReadBuffer::StorageJoinFromReadBufferStorageJoinFromReadBuffer::getJoinLocked在变动前后的耗时变化。

  • before
2024-05-28 11:38:49.246 <Error> StorageJoinFromReadBuffer: xxx 0x7f64a0054e18 Restored 2000000 rows in 187530754 ns
2024-05-28 11:38:49.370 <Error> StorageJoinFromReadBuffer: xxx 0x7f643d800018 Restored 1000000 rows in 121716415 ns
2024-05-28 11:38:49.372 <Error> StorageJoinFromReadBuffer: xxx 0x7f643e010418 Restored 91 rows in 222327 ns


2024-05-28 11:38:49.374 <Error> StorageJoinFromReadBuffer: xxx 0x7f64a0054e18 getJoinLocked in 16978 ns
2024-05-28 11:38:49.376 <Error> StorageJoinFromReadBuffer: xxx 0x7f643d800018 getJoinLocked in 20424 ns
2024-05-28 11:38:49.376 <Error> StorageJoinFromReadBuffer: xxx 0x7f643e010418 getJoinLocked in 13135 ns
  • after
2024-05-28 11:29:53.533 <Error> StorageJoinFromReadBuffer: xxx 0x7f5384fd4418 Restored storage join 61036721 ns. rows: 2000000
2024-05-28 11:29:53.603 <Error> StorageJoinFromReadBuffer: xxx 0x7f53815da018 Restored storage join 64989205 ns. rows: 1000000
2024-05-28 11:29:53.605 <Error> StorageJoinFromReadBuffer: xxx 0x7f53815daa18 Restored storage join 46237 ns. rows: 91


2024-05-28 11:29:53.748 <Error> StorageJoinFromReadBuffer: xxxx 0x7f5384fd4418 getJoinLocked 138523572 ns
2024-05-28 11:29:53.814 <Error> StorageJoinFromReadBuffer: xxxx 0x7f53815da018 getJoinLocked 64938654 ns
2024-05-28 11:29:53.815 <Error> StorageJoinFromReadBuffer: xxxx 0x7f53815daa18 getJoinLocked 70355 ns

StorageJoinFromReadBuffer::StorageJoinFromReadBufferStorageJoinFromReadBuffer::getJoinLocked相加的耗时并太大的差别。

疑问在于StorageJoinFromReadBuffer::StorageJoinFromReadBuffer是否也会阻塞多个线程。

补充,追踪ClickHouseBuildSideRelation::buildHashTable的调用,发现这个函数每个broadcast join 只调用了一次。即在构造 StorageJoinFromReadBuffer时基本不存在阻塞多个线程的情况。而getJoinLocked会阻塞多个线程。所以当getJoinLocked的耗时比较大时,执行会更慢。

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@liuneng1994
Copy link
Contributor

image
spark.sql.autoBroadcastJoinThreshold=100MB

@liuneng1994
Copy link
Contributor

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jun 3, 2024

Run Gluten Clickhouse CI

Copy link
Contributor

@baibaichen baibaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhanglistar zhanglistar merged commit a76c92e into apache:main Jun 3, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Support non-equal hash join
4 participants