New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix analyzer for lambda in aggregation #22539
Conversation
7f4bf51
to
a8dd1dc
Compare
@@ -199,10 +199,14 @@ public static boolean isConstant(Expression expression) | |||
tempExpression = ((Cast) tempExpression).getExpression(); | |||
} | |||
|
|||
if (tempExpression instanceof Literal || tempExpression instanceof ArrayConstructor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elements in array constructor should also be Literals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one more test for this in abstracttestqueries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This util is used for the reduce_agg function, and it will change the behaviour for the reduce_agg function.
For example, currently the query
SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id
will succeed as it considers expression array[id, value]
as constant.
However, after this fix, it will fail with error REDUCE_AGG only supports non-NULL literal as the initial value
as array[id, value]
is not considered as constant now.
cc @kaikalur
@Override | ||
public Expression rewriteLambdaExpression(LambdaExpression node, Boolean context, ExpressionTreeRewriter<Boolean> treeRewriter) | ||
{ | ||
Expression result = super.rewriteLambdaExpression(node, true, treeRewriter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context set to true for lambda expression
@@ -7514,4 +7514,12 @@ public void testGuardConstraintFramework() | |||
assertQuery("select orderkey from (select * from (select * from orders where 1=0)) group by rollup(orderkey)", | |||
"values (null)"); | |||
} | |||
|
|||
@Test | |||
public void testLambdaInAggregation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two will fail without fix here
a8dd1dc
to
a6294e9
Compare
assertQuery("SELECT id, reduce_agg(value, 0, (a, b) -> a + b+0, (a, b) -> a + b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id", "values (1, 9), (2, 90)"); | ||
assertQuery("SELECT id, reduce_agg(value, 's', (a, b) -> concat(a, b, 's'), (a, b) -> concat(a, b, 's')) FROM ( VALUES (1, '2'), (1, '3'), (1, '4'), (2, '20'), (2, '30'), (2, '40') ) AS t(id, value) GROUP BY id", | ||
"values (1, 's2s3s4s'), (2, 's20s30s40s')"); | ||
assertQueryFails("SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the change of isConstant(Expression expression) function. Before change "array[id, value]" is considered constant and this query pass. Now it will throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feilong-liu explained to me that the reason we want this to fail is because semantically it doesn't make sense for the initial state of the lambda function to depend on the value of a column (because which row of the column is it even talking about? It wouldn't even be consistent within a query because on any given worker it would depend on which row it happened to read first)
Description
Fix issue #22558
Fix query compilation error for lambda function in aggregation.
Sample query
It will throw error
(GENERIC_INTERNAL_ERROR) no type found for symbol 'expr_10'
This is because the 0 inside the lambda was rewritten to symbol 'expr_10'. The fix is to skip rewrite for constants within lambda function.
Motivation and Context
Fix bug.
Impact
Fix compilation error for query
Test Plan
Unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.