Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix analyzer for lambda in aggregation #22539

Merged
merged 1 commit into from Apr 19, 2024

Conversation

feilong-liu
Copy link
Contributor

@feilong-liu feilong-liu commented Apr 16, 2024

Description

Fix issue #22558
Fix query compilation error for lambda function in aggregation.
Sample query

SELECT
    id,
    REDUCE_AGG(value, 0, (a, b) -> a + b + 0, (a, b) -> a + b)
FROM (
    VALUES
        (1, 2),
        (1, 3),
        (1, 4),
        (2, 20),
        (2, 30),
        (2, 40)
) AS t(id, value)
GROUP BY
    id

It will throw error (GENERIC_INTERNAL_ERROR) no type found for symbol 'expr_10'
This is because the 0 inside the lambda was rewritten to symbol 'expr_10'. The fix is to skip rewrite for constants within lambda function.

Motivation and Context

Fix bug.

Impact

Fix compilation error for query

Test Plan

Unit tests

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fix compilation error for queries with lambda in aggregation function

@feilong-liu feilong-liu requested review from jaystarshot and a team as code owners April 16, 2024 21:13
@feilong-liu feilong-liu marked this pull request as draft April 16, 2024 21:13
@feilong-liu feilong-liu force-pushed the fix_lambda_agg branch 2 times, most recently from 7f4bf51 to a8dd1dc Compare April 17, 2024 16:30
@@ -199,10 +199,14 @@ public static boolean isConstant(Expression expression)
tempExpression = ((Cast) tempExpression).getExpression();
}

if (tempExpression instanceof Literal || tempExpression instanceof ArrayConstructor) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elements in array constructor should also be Literals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one more test for this in abstracttestqueries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This util is used for the reduce_agg function, and it will change the behaviour for the reduce_agg function.
For example, currently the query
SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id will succeed as it considers expression array[id, value] as constant.
However, after this fix, it will fail with error REDUCE_AGG only supports non-NULL literal as the initial value as array[id, value] is not considered as constant now.
cc @kaikalur

@Override
public Expression rewriteLambdaExpression(LambdaExpression node, Boolean context, ExpressionTreeRewriter<Boolean> treeRewriter)
{
Expression result = super.rewriteLambdaExpression(node, true, treeRewriter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context set to true for lambda expression

@@ -7514,4 +7514,12 @@ public void testGuardConstraintFramework()
assertQuery("select orderkey from (select * from (select * from orders where 1=0)) group by rollup(orderkey)",
"values (null)");
}

@Test
public void testLambdaInAggregation()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will fail without fix here

@feilong-liu feilong-liu marked this pull request as ready for review April 18, 2024 17:24
assertQuery("SELECT id, reduce_agg(value, 0, (a, b) -> a + b+0, (a, b) -> a + b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id", "values (1, 9), (2, 90)");
assertQuery("SELECT id, reduce_agg(value, 's', (a, b) -> concat(a, b, 's'), (a, b) -> concat(a, b, 's')) FROM ( VALUES (1, '2'), (1, '3'), (1, '4'), (2, '20'), (2, '30'), (2, '40') ) AS t(id, value) GROUP BY id",
"values (1, 's2s3s4s'), (2, 's20s30s40s')");
assertQueryFails("SELECT id, reduce_agg(value, array[id, value], (a, b) -> a || b, (a, b) -> a || b) FROM ( VALUES (1, 2), (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the change of isConstant(Expression expression) function. Before change "array[id, value]" is considered constant and this query pass. Now it will throw exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu explained to me that the reason we want this to fail is because semantically it doesn't make sense for the initial state of the lambda function to depend on the value of a column (because which row of the column is it even talking about? It wouldn't even be consistent within a query because on any given worker it would depend on which row it happened to read first)

@feilong-liu feilong-liu requested review from rschlussel and removed request for rschlussel April 18, 2024 20:03
@feilong-liu feilong-liu merged commit 71ad56f into prestodb:master Apr 19, 2024
56 checks passed
@feilong-liu feilong-liu deleted the fix_lambda_agg branch April 19, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants