Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To add random exchange in NLJ and SPF when right child is DAS #1484

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

zhangyuqin1998
Copy link

@zhangyuqin1998 zhangyuqin1998 commented Jul 24, 2023

Task Description

When the data volume in the pipeline is small, the GI operator can only divide a small number of granules, resulting in some workers being idle because they cannot get the granules. This solution inserts a random shuffle operator on GI, and threads that grab granules randomly distribute data to various threads in the cluster, enabling all threads to work and achieving load balancing.
#1494

Solution Description

For now, only Nested Loop Join and SubPlan Filter are considered. Since they both drive right tables with the left table and the right table's query is slower, we need to enable as many threads as possible to lead the right table getting rows in parallel. So we add a random shuffle operator on top of the left child to shuffle the data and enable more threads to work.

Passed Regressions

N/A

Upgrade Compatibility

N/A

Other Information

This is the origin plan. There is obvious task skew.
image

After adding the random shuffle operator, task skew has been eliminated and and the total query time is reduced
image


### Release Note

@zhangyuqin1998 zhangyuqin1998 force-pushed the random_shuffle branch 8 times, most recently from 563a09f to 9f44290 Compare July 26, 2023 07:37
@@ -2792,6 +2792,7 @@ int ObStaticEngineCG::generate_basic_transmit_spec(
spec.is_wf_hybrid_ = op.is_wf_hybrid();
spec.sample_type_ = op.get_sample_type();
spec.repartition_table_id_ = op.get_repartition_table_id();
spec.is_related_pair_ = op.is_related_pair();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

插入 random shuffle 后,会将原来的 dfo 分割成两个 dfo。那么 nlj 所在的 dfo 就会变成纯计算 dfo,它会被调度到 qc 所在到机器上。qc 的机器很可能和左表数据分布在不同的机器上,这会导致左表的所有数据都需要经过网络才能传输给 nlj 算子(比如左表为单分区表,本来不用网络传输就能把数据送给nlj算子)。为了减少网络传输,我们仍然让 nlj 所在的 dfo 和左表的 dfo 分布在同样的机器上。这样,尽管做了 global random shuffle,但还是有一部分数据可以走 local channel。

注释我加在了 exchange info 的结构体里:
/**
When data volume of left table in NLJ is small, there is a task skew due to small granule count.
Therefore, we insert random shuffle above left table to achieve load balancing. Exchange operator
splits DFO into two, and DFO with NLJ is purely computing and scheduled to QC machine. So, data
in left table will be sent to NLJ operator via network. To minimize network transmission, we
schedule DFO with NLJ on machines where left table data is located.
*/

@zhangyuqin1998 zhangyuqin1998 force-pushed the random_shuffle branch 5 times, most recently from efc3c8b to e933fae Compare August 15, 2023 05:57
@@ -4535,6 +4537,8 @@ int ObLogPlan::compute_join_exchange_info(JoinPath &join_path,
} else {
left_exch_info.dist_method_ = ObPQDistributeMethod::HASH;
right_exch_info.dist_method_ = ObPQDistributeMethod::PARTITION_HASH;
// The hash join dfo should rely on the child of the local shuffle side.
left_exch_info.is_related_child_ = true;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slave mapping 中也可以用 is_related_child_ 这个标记,表示父 dfo 依赖子 dfo 进行构造

LOG_WARN("fail to assign p2p dh map info", K(ret));
}
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里之前是个bug。父dfo依赖子dfo进行构造时,没有生成p2p id。这会导致slave mapping和join filter一起使用时报4016。在这里生成p2p id之后可以解决这个bug

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


zhangyuqin1998 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants