Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178

stamboli · 2024-03-20T16:27:34Z

Affected Version

29.0.0

Description

Steps to reproduce the problem
Upload the csv being attached, druid spec for it is also attached for reference

Simplest query that can reproduce issues is as follows

SELECT
COUNT(DISTINCT (CASE WHEN (("SampleSaleData"."__time" >= '2022-01-12T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-13T00:00:00.000Z')) THEN "City" END)) AS "P2-DistinctCities",
COUNT(DISTINCT (CASE WHEN (("SampleSaleData"."__time" >= '2022-01-05T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-06T00:00:00.000Z')) THEN "City" END)) AS "P1-DistinctCities"
FROM
SampleSaleData "SampleSaleData"

This query which used to work earlier and give results as 4, 8 is now returning 0,0

Any debugging that you have already done
-If this query is run keeping one aggregation at a time, either first one or second one only this still gives correct result, only when both are present it is giving wrong result

abhishekagarwal87 · 2024-03-20T16:43:35Z

was the approximate distinct count turned off when you ran this query?

stamboli · 2024-03-20T16:45:57Z

Yes In environment file I have druid_sql_planner_useApproximateCountDistinct=false

…

________________________________ From: Abhishek Agarwal ***@***.***> Sent: 20 March 2024 22:13 To: apache/druid ***@***.***> Cc: stamboli ***@***.***>; Author ***@***.***> Subject: Re: [apache/druid] Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between (Issue #16178) was the approximate distinct count turned off when you ran this query? — Reply to this email directly, view it on GitHub<#16178 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAXRAAJMXUTCYDF5CSWOGKTYZG4E3AVCNFSM6AAAAABE72CV22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGAZDSMJVGU>. You are receiving this because you authored the thread.Message ID: ***@***.***>

stamboli · 2024-03-20T17:32:56Z

environment.txt

abhishekagarwal87 · 2024-03-21T04:21:47Z

can you set druid.sql.planner.useGroupingSetForExactDistinct to true and see if that fixes the issue? This bug might be same as what's being discussed here - apache/calcite#3735 (comment)

abhishekagarwal87 · 2024-03-21T04:23:18Z

Though I am surprised how did this query even work in 25.0.0 without you setting druid.sql.planner.useGroupingSetForExactDistinct. It would have failed outright.

stamboli · 2024-03-21T09:42:26Z

No luck with this setting too :(
Surprisingly as explained above even without this flag or so single aggregation at a time works

stamboli · 2024-03-21T10:09:35Z

Looking at your test case I formed query based on it which works.
SELECT
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-12T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-13T00:00:00.000Z')) AS "P2-DistinctCities",
COUNT(DISTINCT "City") FILTER (WHERE ("SampleSaleData"."__time" >= '2022-01-05T00:00:00.000Z') AND ("SampleSaleData"."__time" < '2022-01-06T00:00:00.000Z')) AS "P2-DistinctCities"
FROM
SampleSaleData "SampleSaleData"

But this query, very specific to druid. The solution we are building need to work with multiple DB this query is not working with MySQL as well as Snowflake, the queries are built dynamically, so a very specific druid SQL needs to be built. Until now CASE based query was worked with other traditional standard DBs used to work with druid as well.
So overall now this is failure is specifically related to CASE statements with multiple such aggregations

stamboli added the Uncategorized problem report label Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178

stamboli commented Mar 20, 2024

abhishekagarwal87 commented Mar 20, 2024

stamboli commented Mar 20, 2024 via email

stamboli commented Mar 20, 2024

abhishekagarwal87 commented Mar 21, 2024

abhishekagarwal87 commented Mar 21, 2024

stamboli commented Mar 21, 2024

stamboli commented Mar 21, 2024

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178

Wrong results for filtered aggregates when run through SQL query, this was working on 25.0.0 but found not working on latest release 29.0.0, so has broken somewhere in between #16178

Comments

stamboli commented Mar 20, 2024

Affected Version

Description

abhishekagarwal87 commented Mar 20, 2024

stamboli commented Mar 20, 2024 via email

stamboli commented Mar 20, 2024

abhishekagarwal87 commented Mar 21, 2024

abhishekagarwal87 commented Mar 21, 2024

stamboli commented Mar 21, 2024

stamboli commented Mar 21, 2024