New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize unnecessary column copy for HashAgg #8985
base: master
Are you sure you want to change the base?
Optimize unnecessary column copy for HashAgg #8985
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
15ff512
to
60bd8ea
Compare
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
/run-all-tests |
/test all |
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
/test all |
/test all |
/test all |
/test pull-integration-tes |
@guo-shaoge: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-integration-test |
@guo-shaoge: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@@ -575,6 +588,7 @@ void DAGExpressionAnalyzer::buildAggGroupBy( | |||
/// need double check this assumption when we support agg with collation | |||
aggregation_keys.push_back(name); | |||
agg_key_set.emplace(name); | |||
collators.push_back(nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems L591 and L605 have changed the original logic. Was this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there was a bug in the previous code?
aggregated_columns, | ||
false, | ||
context); | ||
auto [first_row_name, first_row_type] = findFirstRow(aggregate_descriptions, name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is first_row_type
necessary here? Are the type
at L600 and first_row_type
always the same? If so, can we just use type
?
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
700902d
to
b62ed01
Compare
Signed-off-by: guo-shaoge <shaoge1994@163.com>
@@ -575,6 +588,7 @@ void DAGExpressionAnalyzer::buildAggGroupBy( | |||
/// need double check this assumption when we support agg with collation | |||
aggregation_keys.push_back(name); | |||
agg_key_set.emplace(name); | |||
collators.push_back(nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's expected
const Context & context, | ||
const Block & before_agg_header, | ||
size_t before_agg_streams_size, | ||
size_t agg_streams_size, | ||
const Names & key_names, | ||
const KeyRefAggFuncMap & key_ref_agg_func, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments about the meaning of above two map.
// Before: keys: c1 | c2 | c3 | ||
// After: keys: c2 | c1 | c3 | ||
// By doing this, when deserialize group by keys from HashMap to columns, | ||
// we only need to handle c2(convert_key_size == 1) and ignore c1/c3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment to clarify that only extract partial groupby keys, and others will be skipped and reference to first_row
result column.
What problem does this PR solve?
Issue Number: close #8891
Problem Summary:
When there are group by key in select item(a.k.a. first_row), tiflash have extra agg func, which cause unnecessary copy from HashMap to Column.
What is changed and how it works?
Basic idea:
Optimization-1 (with collation):
How:
first_row
agg func in the select item.any
agg func. If not, add the 'any' agg func.key_from_agg_func
to indicate that this key is equivalent tofirst_row/any
agg func, which can avoid copying this key from the HashMap in subsequent operations. (checkDAGExpressionAnalyzer::buildAggGroupBy
)first_row/any
(which is rare, but still can happens), will skip copy keys(template argumentskip_serialize_key
is true)Optimization-2(no collation)
The above optimization happens when
c2
has collation, ifc2
has no collation(its type is not string), the process is as follows. Sofirst_row
is deleted.Results:
before:
after:
Check List
Tests
Side effects
Documentation
Release note