To add random exchange in NLJ and SPF when right child is DAS #1484

zhangyuqin1998 · 2023-07-24T13:00:24Z

Task Description

When the data volume in the pipeline is small, the GI operator can only divide a small number of granules, resulting in some workers being idle because they cannot get the granules. This solution inserts a random shuffle operator on GI, and threads that grab granules randomly distribute data to various threads in the cluster, enabling all threads to work and achieving load balancing.
#1494

Solution Description

For now, only Nested Loop Join and SubPlan Filter are considered. Since they both drive right tables with the left table and the right table's query is slower, we need to enable as many threads as possible to lead the right table getting rows in parallel. So we add a random shuffle operator on top of the left child to shuffle the data and enable more threads to work.

Passed Regressions

N/A

Upgrade Compatibility

N/A

Other Information

This is the origin plan. There is obvious task skew.

After adding the random shuffle operator, task skew has been eliminated and and the total query time is reduced

### Release Note

zhangyuqin1998 · 2023-07-26T09:15:05Z

src/sql/code_generator/ob_static_engine_cg.cpp

@@ -2792,6 +2792,7 @@ int ObStaticEngineCG::generate_basic_transmit_spec(
    spec.is_wf_hybrid_ = op.is_wf_hybrid();
    spec.sample_type_ = op.get_sample_type();
    spec.repartition_table_id_ = op.get_repartition_table_id();
+    spec.is_related_pair_ = op.is_related_pair();


插入 random shuffle 后，会将原来的 dfo 分割成两个 dfo。那么 nlj 所在的 dfo 就会变成纯计算 dfo，它会被调度到 qc 所在到机器上。qc 的机器很可能和左表数据分布在不同的机器上，这会导致左表的所有数据都需要经过网络才能传输给 nlj 算子（比如左表为单分区表，本来不用网络传输就能把数据送给nlj算子）。为了减少网络传输，我们仍然让 nlj 所在的 dfo 和左表的 dfo 分布在同样的机器上。这样，尽管做了 global random shuffle，但还是有一部分数据可以走 local channel。

注释我加在了 exchange info 的结构体里：
/**
When data volume of left table in NLJ is small, there is a task skew due to small granule count.
Therefore, we insert random shuffle above left table to achieve load balancing. Exchange operator
splits DFO into two, and DFO with NLJ is purely computing and scheduled to QC machine. So, data
in left table will be sent to NLJ operator via network. To minimize network transmission, we
schedule DFO with NLJ on machines where left table data is located.
*/

zhangyuqin1998 · 2023-08-17T11:53:56Z

src/sql/optimizer/ob_log_plan.cpp

@@ -4535,6 +4537,8 @@ int ObLogPlan::compute_join_exchange_info(JoinPath &join_path,
        } else {
          left_exch_info.dist_method_ = ObPQDistributeMethod::HASH;
          right_exch_info.dist_method_ = ObPQDistributeMethod::PARTITION_HASH;
+          // The hash join dfo should rely on the child of the local shuffle side.
+          left_exch_info.is_related_child_ = true;


slave mapping 中也可以用 is_related_child_ 这个标记，表示父 dfo 依赖子 dfo 进行构造

zhangyuqin1998 · 2023-08-17T11:55:56Z

src/sql/engine/px/ob_px_util.cpp

+              LOG_WARN("fail to assign p2p dh map info", K(ret));
+            }
+          }
+        }


这里之前是个bug。父dfo依赖子dfo进行构造时，没有生成p2p id。这会导致slave mapping和join filter一起使用时报4016。在这里生成p2p id之后可以解决这个bug

CLAassistant · 2024-05-10T08:07:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

zhangyuqin1998 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

zhangyuqin1998 force-pushed the random_shuffle branch 8 times, most recently from 563a09f to 9f44290 Compare July 26, 2023 07:37

zhangyuqin1998 commented Jul 26, 2023

View reviewed changes

add random exchange in NLJ and SPF when right child is DAS

48b8129

zhangyuqin1998 force-pushed the random_shuffle branch 5 times, most recently from efc3c8b to e933fae Compare August 15, 2023 05:57

fix alloc_by_reference_child_distribution

91f69a9

zhangyuqin1998 force-pushed the random_shuffle branch from e933fae to 91f69a9 Compare August 15, 2023 06:04

add comment

8b47c33

zhangyuqin1998 commented Aug 17, 2023

View reviewed changes

zhangyuqin1998 added 4 commits August 21, 2023 17:43

add test case

24c7d5a

add test case

73efbce

add test case

27cbef0

fix

ecd7cfe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To add random exchange in NLJ and SPF when right child is DAS #1484

To add random exchange in NLJ and SPF when right child is DAS #1484

zhangyuqin1998 commented Jul 24, 2023 •

edited

zhangyuqin1998 Jul 26, 2023 •

edited

zhangyuqin1998 Aug 17, 2023

zhangyuqin1998 Aug 17, 2023

CLAassistant commented May 10, 2024

To add random exchange in NLJ and SPF when right child is DAS #1484

Are you sure you want to change the base?

To add random exchange in NLJ and SPF when right child is DAS #1484

Conversation

zhangyuqin1998 commented Jul 24, 2023 • edited

Task Description

Solution Description

Passed Regressions

Upgrade Compatibility

Other Information

zhangyuqin1998 Jul 26, 2023 • edited

Choose a reason for hiding this comment

zhangyuqin1998 Aug 17, 2023

Choose a reason for hiding this comment

zhangyuqin1998 Aug 17, 2023

Choose a reason for hiding this comment

CLAassistant commented May 10, 2024

zhangyuqin1998 commented Jul 24, 2023 •

edited

zhangyuqin1998 Jul 26, 2023 •

edited