Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[single stage] HAVING fails if aggregation is not in SELECT part #13000

Closed
cyrilou242 opened this issue Apr 24, 2024 · 2 comments
Closed

[single stage] HAVING fails if aggregation is not in SELECT part #13000

cyrilou242 opened this issue Apr 24, 2024 · 2 comments
Labels

Comments

@cyrilou242
Copy link
Contributor

cyrilou242 commented Apr 24, 2024

Problem

The following query fails:

select 
OS_Version
from CleanLogisticData 
GROUP BY OS_Version 
HAVING MAX("timestamp") > 10800
limit 10

With

Error Code: 450
InternalError:
java.io.IOException: Failed : HTTP error code : 500. Root Cause: Failed to find SELECT expression: OS_Version in the GROUP-BY clause
	at org.apache.pinot.controller.api.resources.PinotQueryResource.sendPostRaw(PinotQueryResource.java:424)
	at org.apache.pinot.controller.api.resources.PinotQueryResource.sendRequestRaw(PinotQueryResource.java:462)
	at org.apache.pinot.controller.api.resources.PinotQueryResource.sendRequestToBroker(PinotQueryResource.java:356)
	at org.apache.pinot.controller.api.resources.PinotQueryResource.getQueryResponse(PinotQueryResource.java:276)

It seems the single stage engine does not accept HAVING clause if the aggregation in the HAVING clause is not in the SELECT clause.

This works (but is not the desired query):

select 
OS_Version,
MAX("timestamp") as unused
from CleanLogisticData 
GROUP BY OS_Version 
HAVING MAX("timestamp") > 10800
limit 10

Expected behaviour

The query should be accepted. It works with the multi-stage engine.

I don't know if it's a regression of if it has always behaved like this in the single stage engine.
I could reproduce this in 1.1.0 and OSS 1.0.0

@Jackie-Jiang
Copy link
Contributor

This is not a regression.
Single stage engine doesn't support GROUP-BY without AGGREGATE, so it will rewrite select OS_Version from CleanLogisticData GROUP BY OS_Version into select distinct OS_Version from CleanLogisticData.
The proper fix should be automatically adding the aggregation in the HAVING clause into the SELECT, but not show it in the final result. I don't think we have the primitive to do this right now. The query rewrite can be added, but this extra column will also be in the final result.
@cyrilou242 Is this a blocker for you?

@cyrilou242
Copy link
Contributor Author

Hey @Jackie-Jiang this is not a blocker for me.
Given it's not a regression I'm fine if it's not fixed and moving to query engine v2 is the path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants