-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable push-down of constant join conditions in outer-joins #15760
base: master
Are you sure you want to change the base?
Conversation
3977a0d
to
4e86d49
Compare
4e86d49
to
544df0c
Compare
544df0c
to
d54d92c
Compare
" │ └ Collect[doc.t2 | [cluster_id] | (kind = 'bar')]", | ||
" └ Rename[cluster_id, kind] AS temp", | ||
" └ Collect[doc.t2 | [cluster_id, kind] | true]" | ||
"Eval[id, reference]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example of why this change makes sense. The constant join condition from the left-join gets extracted and pushed down and the outer-join can be converted to an inner-join which leads to a more efficient query plan using solely hash-joins.
+------------------------------------------------------+----------------------------------------------------------------------------------------+
| STEP | QUERY PLAN |
+------------------------------------------------------+----------------------------------------------------------------------------------------+
| Initial logical plan | Eval[id, reference] (rows=unknown) |
| | └ Join[LEFT | ((cluster_id = id) AND (kind = 'bar'))] (rows=unknown) |
| | ├ Join[INNER | ((cluster_id = id) AND (kind = 'bar'))] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_extract_constant_join_conditions_to_filter | Eval[id, reference] (rows=0) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Join[LEFT | (cluster_id = id)] (rows=unknown) |
| | ├ Join[INNER | ((cluster_id = id) AND (kind = 'bar'))] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_rewrite_filter_on_outer_join_to_inner_join | Eval[id, reference] (rows=unknown) |
| | └ Join[INNER | (cluster_id = id)] (rows=unknown) |
| | ├ Join[INNER | ((cluster_id = id) AND (kind = 'bar'))] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_rewrite_join_plan | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ Join[INNER | ((cluster_id = id) AND (kind = 'bar'))] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_extract_constant_join_conditions_to_filter | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Join[INNER | (cluster_id = id)] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_move_filter_beneath_join | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ Join[INNER | (cluster_id = id)] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_rewrite_join_plan | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ └ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_move_filter_beneath_join | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ Join[INNER | (subscription_id = id)] (rows=unknown) |
| | │ │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ │ └ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_rewrite_join_plan | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Filter[(reference = 'bazinga')] (rows=0) |
| | │ │ │ └ Collect[doc.t3 | [id, reference] | true] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_merge_filter_and_collect | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | (reference = 'bazinga')] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Filter[(kind = 'bar')] (rows=0) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_merge_filter_and_collect | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | (reference = 'bazinga')] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | (kind = 'bar')] (rows=unknown) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_move_filter_beneath_rename | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | (reference = 'bazinga')] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | (kind = 'bar')] (rows=unknown) |
| | └ Rename[cluster_id, kind] AS temp (rows=0) |
| | └ Filter[(kind = 'bar')] (rows=0) |
| | └ Collect[doc.t2 | [cluster_id, kind] | true] (rows=unknown) |
| optimizer_merge_filter_and_collect | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | (reference = 'bazinga')] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id, kind] | (kind = 'bar')] (rows=unknown) |
| | └ Rename[cluster_id, kind] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id, kind] | (kind = 'bar')] (rows=unknown) |
| Final logical plan | Eval[id, reference] (rows=unknown) |
| | └ HashJoin[(cluster_id = id)] (rows=unknown) |
| | ├ HashJoin[(cluster_id = id)] (rows=unknown) |
| | │ ├ HashJoin[(subscription_id = id)] (rows=unknown) |
| | │ │ ├ Collect[doc.t3 | [id, reference] | (reference = 'bazinga')] (rows=unknown) |
| | │ │ └ Collect[doc.t1 | [subscription_id, id] | true] (rows=unknown) |
| | │ └ Collect[doc.t2 | [cluster_id] | (kind = 'bar')] (rows=unknown) |
| | └ Rename[cluster_id] AS temp (rows=unknown) |
| | └ Collect[doc.t2 | [cluster_id] | (kind = 'bar')] (rows=unknown) |
+------------------------------------------------------+----------------------------------------------------------------------------------------+
EXPLAIN 14 rows in set (0.026 sec)
server/src/main/java/io/crate/planner/optimizer/rule/ExtractConstantJoinConditionsToFilter.java
Outdated
Show resolved
Hide resolved
@mfussenegger I made a second iteration to handle outer joins properly. Outer-joins only apply constant join conditions on their non-preserved side and get ignored on the other side:
Therefore we can only extract the filter in these cases on top of the join. The pushdown took that already into account, but we have to be aware of that also on the filter creation. |
The test is still flaky, i will put this on hold. |
Summary of the changes / Why this improves CrateDB
This is based on @jeeminso previous work on joins.
At the current state we have the optimizer rule
optimizer_move_constant_join_conditions_beneath_join
which pushes down constant join conditions beyond an inner-join:becomes:
We also have already
optimizer_move_filter_beneath_join
which pushes filters beyond inner/outer/cross/nested joins:becomes:
This pr changes
optimizer_move_constant_join_conditions_beneath_join
so the constant join condition is extracted to a filter on top of the join which can then be pushed byoptimizer_move_filter_beneath_join
:becomes:
to:
The benefit of this change is:
Checklist
docs/appendices/release-notes/<x.y.0>.rst
for user facing changessql_features
table for user facing changesdocs/appendices/release-notes/<x.y.0>.rst
(E.g. AdminUI)