New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178
Comments
was the |
Yes
In environment file I have
druid_sql_planner_useApproximateCountDistinct=false
…________________________________
From: Abhishek Agarwal ***@***.***>
Sent: 20 March 2024 22:13
To: apache/druid ***@***.***>
Cc: stamboli ***@***.***>; Author ***@***.***>
Subject: Re: [apache/druid] Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between (Issue #16178)
was the approximate distinct count turned off when you ran this query?
—
Reply to this email directly, view it on GitHub<#16178 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAXRAAJMXUTCYDF5CSWOGKTYZG4E3AVCNFSM6AAAAABE72CV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGAZDSMJVGU>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
can you set |
Though I am surprised how did this query even work in 25.0.0 without you setting |
No luck with this setting too :( |
Looking at your test case I formed query based on it which works. But this query, very specific to druid. The solution we are building need to work with multiple DB this query is not working with MySQL as well as Snowflake, the queries are built dynamically, so a very specific druid SQL needs to be built. Until now CASE based query was worked with other traditional standard DBs used to work with druid as well. |
SampleSaleData.json
SampleSaleData.csv
Affected Version
29.0.0
Description
Upload the csv being attached, druid spec for it is also attached for reference
Simplest query that can reproduce issues is as follows
SELECT
COUNT(DISTINCT (CASE WHEN (("SampleSaleData"."__time" >= '2022-01-12T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-13T00:00:00.000Z')) THEN "City" END)) AS "P2-DistinctCities",
COUNT(DISTINCT (CASE WHEN (("SampleSaleData"."__time" >= '2022-01-05T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-06T00:00:00.000Z')) THEN "City" END)) AS "P1-DistinctCities"
FROM
SampleSaleData "SampleSaleData"
This query which used to work earlier and give results as 4, 8 is now returning 0,0
-If this query is run keeping one aggregation at a time, either first one or second one only this still gives correct result, only when both are present it is giving wrong result
The text was updated successfully, but these errors were encountered: