Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add join limit pushdown for inner join #1506

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

XueQinliang
Copy link

Task Description

Limit算子是SQL中用来限制返回的结果集条数的算子,一般加在查询的最后。
如果查询中存在多表连接(不管这个表是一个原始表、临时表还是子查询),我们在保证正确性的前提下,我们可以把limit下推到join之前的表上,减小Join操作的结果集大小,从而降低后续计算的代价。

Solution Description

总体上,分为三大步骤,check、get、do_transform。
● 在check中,需要在已有左连接逻辑的基础上,将对inner join无损的判断补充进去。这里复杂的地方在于,一是多表连接时inner join的无损需要递归判断每一个与之相连的表是否无损,二是对于压平的内连接需要有一个方法在其中搜索无损连接并且不能因为limit下推而多出来笛卡尔积。
● 在get中,根据check步骤中记录的内容,得到可以推迟连接的条件和表(也即lazy join),还要在这个过程中判断内连接是否需要构造一个is not null条件。这里需要从上到下逐层获取,因为上层的判断结果会影响下层是否可以上拉。
● 在do_transform这一步,主要工作是把get步骤中得到的lazy join上拉,剩余的表和条件单独构成一个新视图加入进去,然后再对这个视图加limit。要注意的主要是新视图不能改变和原有表之间的连接,且变换之后不能有笛卡尔积出现。对于前者,要获取到所有下推表上的连接条件,转换为新视图的连接条件,对于后者,要保证视图中包含的表之间是有连接的,只要视图中的表有连接,就不会导致原有可以连接上的表出现笛卡尔积。

解决的场景:

  1. 外键与主键内连接
  2. 隐含在视图与视图或视图与单表之间的自连接
  3. 列为min/max聚合列,且满足1或者2的列之间的连接

一些强化点:

  1. 该列允许为空时,手动构造is not null条件使得满足无损连接要求
  2. 对含有limit的子查询做了强化判断且加了约束,目前可以进行limit下推优化
  3. 对内连接上的非连接条件从on上分离到where上,消除了非连接条件对无损判定的影响
  4. 从笛卡尔积构建内连接时,考虑where条件的要求来对表的顺序进行重排,尽量避免笛卡尔积出现

Passed Regressions

Upgrade Compatibility

Other Information

Release Note

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

} else if (!is_valid) {
// do nothing
} else if (OB_FAIL(helper->pushdown_tables_.push_back(inner_join_table))) {
// do nothing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

少了LOG_WARN

if (OB_ISNULL(stmt) || OB_ISNULL(inner_join_table)) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("unexpected null pointer of stmt or inner join table", K(ret));
} else if (!is_valid) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_valid在这里没必要判一次吧

TableItem* table = inner_join_table;
ObSqlBitSet<> &expr_relation_ids = helper->expr_relation_ids_;
bool is_loseless = true;
while (table->type_ == TableItem::JOINED_TABLE) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

额外判一下OB_SUCC(ret)

trans_happened = true;
}
}
if (check_inner_valid) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个新的逻辑不能跟原来的逻辑合并在一起么,按照代码逻辑走到这里应该只会触发改写,不应该再有其他检查了

ObColumnRefRawExpr *right = NULL;
equaljoins.reuse();
otherconds.reuse();
for (int i = 0; i < N; i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有判OB_SUCC

ret = OB_ERR_UNEXPECTED;
LOG_WARN("failed to get condition expr", K(ret));
} else if (OB_FAIL(check_get_two_columns_from_condition(conds.at(i), left, right, is_single_cond_valid, has_agg))) {
ret = OB_ERR_UNEXPECTED;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里覆盖了错误码

LOG_WARN("failed to get two columns from conditioin", K(ret));
}
if (is_single_cond_valid) {
equaljoins.push_back(conds.at(i));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里和下面的push back都没有检查ret

} else if (cond->get_param_count() != 2) {
is_valid = false;
} else if (OB_FAIL(get_condition_columnref_children(cond, column_exprs, is_valid, has_agg))) {
ret = OB_ERR_UNEXPECTED;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

错误码覆盖

is_valid = true;
has_column = true;
column_expr = child_expr;
} else if (OB_FAIL(check_expr_min_max(child_expr, column_expr, is_valid)) && (has_agg=true)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照代码逻辑这里判断的是一个group by之前的等值表达式,那么这里不应该看到聚合类表达式

for (int64_t i = 0; OB_SUCC(ret) && is_valid && i < N; ++i) {
ObRawExpr* child_expr = cond->get_param_expr(i);
ObRawExpr* column_expr = NULL;
if (child_expr->has_flag(IS_COLUMN)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child_expr先判空

}

int ObTransformUtils::check_expr_min_max(ObRawExpr *expr, ObRawExpr *&column_ref, bool &is_valid)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数column_ref跟is_valid看起来只需要一个就够了

LOG_WARN("failed to push back condition", K(ret));
}
}
inner_joined_table->join_conditions_.reuse();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本层的join_conditions_应该就是this_level_join_conditions + otherconds吧。
另外在build_inner_joined_tables里把otherconds放到了stmt的where condition里,这里的otherconds不需要放进去么

{
int ret = OB_SUCCESS;
for (int i = 0; is_loseless && i < conds.count(); i++) {
ObRelIds &relation_ids = conds.at(i)->get_relation_ids();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cond成员使用前判空

int ret = OB_SUCCESS;
for (int i = 0; is_loseless && i < conds.count(); i++) {
ObRelIds &relation_ids = conds.at(i)->get_relation_ids();
if (relation_ids.num_members() > 1) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在的实现要求不能出现涉及到多个左表的条件么

@@ -484,7 +498,6 @@ int ObTransformUtils::is_columns_unique(const ObIArray<ObRawExpr *> &exprs,
LOG_WARN("failed to get table schema", K(ret),
"index_id", simple_index_infos.at(i).table_id_);
} else if (OB_ISNULL(index_schema)) {
ret = OB_ERR_UNEXPECTED;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么删了

@@ -635,7 +648,6 @@ int ObTransformUtils::create_new_column_expr(ObTransformerCtx *ctx,
} else if (OB_FAIL(ctx->expr_factory_->create_raw_expr(T_REF_COLUMN, new_column_ref))) {
LOG_WARN("failed to create a new column ref expr", K(ret));
} else if (OB_ISNULL(new_column_ref)) {
ret = OB_ERR_UNEXPECTED;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥也删了

} else if (!stmt->is_select_stmt()) {
ret = OB_ERR_UNEXPECTED;
LOG_WARN("unexpect stmt type", K(ret));
} else if (OB_UNLIKELY(select_column_id - OB_APP_MIN_COLUMN_ID < 0)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里判一下 < 0 || >= select_stmt->get_select_item_size()

} else {
select_expr = static_cast<ObSelectStmt*>(stmt)->get_select_item(select_column_id - OB_APP_MIN_COLUMN_ID).expr_;
}
if (OB_ISNULL(select_expr)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能存在错误码覆盖

TableItem *left_table,
TableItem *right_table,
ObIArray<ObRawExpr*> &conds,
ObSEArray<ObRawExpr*, 4> &connected_conds,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数统一使用基类ObIArray吧,不要限制调用者使用的array类型

return ret;
}

int ObTransformUtils::check_limit_join_loseless(ObTransformerCtx *ctx,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, loseless -> lossless

JoinedTable* joined_table_ptr = static_cast<JoinedTable*>(joined_table);
ObSEArray<uint64_t, 8> left_ids;
ObSEArray<uint64_t, 8> right_ids;
if (OB_ISNULL(joined_table_ptr->left_table_) || OB_ISNULL(joined_table_ptr->right_table_)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没有修改错误码

is_valid = true;
ObColumnRefRawExpr *left;
ObColumnRefRawExpr *right;
for (int i = 0; is_valid && i < N; i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

判一下OB_SUCC

bool right_agg = false;
if (left->get_table_id() == left_table->table_id_ && right->get_table_id() == right_table->table_id_) {
// do nothing
if (OB_NOT_NULL(conds.at(i)->get_param_expr(0)) && conds.at(i)->get_param_expr(0)->has_flag(IS_AGG)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

连接条件里是不会出现聚合表达式的,这里是判不出left_agg和right_agg的

} else if (!is_foreign_primary_join || is_first_table_parent) {
/* do nothing */
OPT_TRACE("is not foreign primary join");
} else if (OB_UNLIKELY(!join_right_table->access_all_part())) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里判了partition hint,前面没有判

is_foreign_key = true;
}

// 判断表的包含关系
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个包含关系的判断不依赖foreign key的判断,应该放在上面

ref_query->is_calc_found_rows() ||
ref_query->has_order_by() ||
ref_query->is_scala_group_by() ||
ref_query->has_distinct()) {
// ignore push down when ref_query has
// distinct/orderby/limit/rownum/scalar group by
// distinct/orderby/limit/rownum/scalar group by
LOG_WARN("invalid check ref query offset limit expr lower equal: don't push down");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能触发改写不要打WARN级别的日志

// 如果表是generated_table, 获取的表达式是外层的而不是子查询的
LOG_WARN("failed to get two column expr from condition", K(ret));
} else if (!is_valid) { // 如果有一个条件上包含单列或者多列,直接不继续判断了
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有is_valid做状态控制了,就没必要用break了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants