Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Native] - Query with correlated subquery in the exists subclause in where condition is returning incorrect results #22585

Open
mknegi opened this issue Apr 22, 2024 · 7 comments
Assignees

Comments

@mknegi
Copy link

mknegi commented Apr 22, 2024

Query with correlated subquery in the exists subclause in where condition is returning incorrect results. Here is the testcase that fails due to this problem:

Testcase: presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java::testCorrelatedExistsSubqueries()
Run under derived class: TestTpchDistributedQueries

Expected Behavior

java.lang.AssertionError: For query:
SELECT count(*) FROM orders o WHERE EXISTS(SELECT 1 FROM orders i WHERE o.orderkey < i.orderkey AND i.orderkey % 1000 = 0)
not equal
Actual rows (1 of 1 extra rows shown, 1 rows in total):
[31762]
Expected rows (1 of 1 missing rows shown, 1 rows in total):
[14999]

If you see how the query in this scenario is constructed using the correlated subquery it should have only reported count(*) as 14999 because the orders table only have 15000 records in total. So, this query reporting so many extra rows shows that the semantics of correlated subquery aren’t somehow working in this case and that should generally be a defect.

I ran the select * from orders instead of select count(*) in the testcase and it seems there are lots of duplicate rows in the query output. This may explain why the row count was so much more than 14999 in this case.

java.lang.AssertionError: For query: 
 SELECT * FROM orders o WHERE EXISTS(SELECT 1 FROM orders i WHERE o.orderkey < i.orderkey AND i.orderkey % 1000 = 0)
not equal
Actual rows (100 of 2062 extra rows shown, 17061 rows in total):
    [6535, 244, F, 207170.87, 1992-06-13, 1-URGENT, Clerk#000000650, 0, s. ironic, even theodolites haggle ]
    [6535, 244, F, 207170.87, 1992-06-13, 1-URGENT, Clerk#000000650, 0, s. ironic, even theodolites haggle ]
    [6560, 1204, F, 318034.04, 1995-01-05, 5-LOW, Clerk#000000328, 0, ven packages. deposits wake furiously enticingly regular theodolites; un]
    [6560, 1204, F, 318034.04, 1995-01-05, 5-LOW, Clerk#000000328, 0, ven packages. deposits wake furiously enticingly regular theodolites; un]
    [6561, 1402, F, 200779.81, 1994-05-19, 2-HIGH, Clerk#000000813, 0,  have to are furiously furiously even deposits. carefu]
    [6561, 1402, F, 200779.81, 1994-05-19, 2-HIGH, Clerk#000000813, 0,  have to are furiously furiously even deposits. carefu]
    [6562, 643, F, 221812.3, 1994-12-21, 1-URGENT, Clerk#000000616, 0, ely ironic, permanent deposits. blithely busy grouches wak]
    [6562, 643, F, 221812.3, 1994-12-21, 1-URGENT, Clerk#000000616, 0, ely ironic, permanent deposits. blithely busy grouches wak]
    [6563, 1055, F, 94449.31, 1992-11-06, 5-LOW, Clerk#000000666, 0, he fluffily final requests s]
    [6563, 1055, F, 94449.31, 1992-11-06, 5-LOW, Clerk#000000666, 0, he fluffily final requests s]
    [6564, 565, F, 150871.17, 1995-01-22, 5-LOW, Clerk#000000777, 0, nic forges! quickly unusual packages haggle. blithely reg]
    [6564, 565, F, 150871.17, 1995-01-22, 5-LOW, Clerk#000000777, 0, nic forges! quickly unusual packages haggle. blithely reg]
    [6565, 277, O, 262344.82, 1997-02-25, 4-NOT SPECIFIED, Clerk#000000657, 0, final dependencies. blithely express pinto ]
    [6565, 277, O, 262344.82, 1997-02-25, 4-NOT SPECIFIED, Clerk#000000657, 0, final dependencies. blithely express pinto ]
    [6566, 1369, P, 175594.82, 1995-05-22, 1-URGENT, Clerk#000000218, 0, al notornis. blithely express pac]
    [6566, 1369, P, 175594.82, 1995-05-22, 1-URGENT, Clerk#000000218, 0, al notornis. blithely express pac]
    [6567, 1294, O, 53318.58, 1997-07-22, 2-HIGH, Clerk#000000402, 0,  deposits haggle. slyly fluffy dependencies sleep carefully packages.]
    [6567, 1294, O, 53318.58, 1997-07-22, 2-HIGH, Clerk#000000402, 0,  deposits haggle. slyly fluffy dependencies sleep carefully packages.]
    [6592, 221, F, 240055.49, 1994-12-02, 5-LOW, Clerk#000000213, 0, he slyly final somas. slyly final accounts above ]
    [6592, 221, F, 240055.49, 1994-12-02, 5-LOW, Clerk#000000213, 0, he slyly final somas. slyly final accounts above ]
    [6593, 293, O, 123753.91, 1997-05-16, 2-HIGH, Clerk#000000678, 0, ular deposits cajole furiously. doggedly regular accounts c]
    [6593, 293, O, 123753.91, 1997-05-16, 2-HIGH, Clerk#000000678, 0, ular deposits cajole furiously. doggedly regular accounts c]
    [6594, 1288, O, 205591.26, 1996-08-23, 1-URGENT, Clerk#000000712, 0, final deposits above the stealthily even packages ]
    [6594, 1288, O, 205591.26, 1996-08-23, 1-URGENT, Clerk#000000712, 0, final deposits above the stealthily even packages ]
    [6595, 881, F, 24750.02, 1992-10-03, 1-URGENT, Clerk#000000220, 0, onic, even dependencies sleep ]
    [6595, 881, F, 24750.02, 1992-10-03, 1-URGENT, Clerk#000000220, 0, onic, even dependencies sleep ]
    [6596, 1120, O, 169485.95, 1995-07-22, 5-LOW, Clerk#000000106, 0, rough the blithely even pinto beans. quickl]
    [6596, 1120, O, 169485.95, 1995-07-22, 5-LOW, Clerk#000000106, 0, rough the blithely even pinto beans. quickl]
    [6597, 1189, O, 100902.3, 1998-06-27, 5-LOW, Clerk#000000392, 0,  furiously about the slyly spec]
    [6597, 1189, O, 100902.3, 1998-06-27, 5-LOW, Clerk#000000392, 0,  furiously about the slyly spec]
    [6598, 109, F, 66566.87, 1992-05-08, 3-MEDIUM, Clerk#000000691, 0, ts. quickly ironic p]
    [6598, 109, F, 66566.87, 1992-05-08, 3-MEDIUM, Clerk#000000691, 0, ts. quickly ironic p]
    [6599, 599, O, 214143.21, 1998-04-17, 5-LOW, Clerk#000000365, 0,  x-ray fluffily according to the fu]
    [6599, 599, O, 214143.21, 1998-04-17, 5-LOW, Clerk#000000365, 0,  x-ray fluffily according to the fu]
    [6624, 1130, O, 44018.37, 1997-08-18, 1-URGENT, Clerk#000000553, 0, e fluffily according to the ideas]
    [6624, 1130, O, 44018.37, 1997-08-18, 1-URGENT, Clerk#000000553, 0, e fluffily according to the ideas]
    [6625, 1345, F, 15961.57, 1993-08-16, 2-HIGH, Clerk#000000228, 0, rious theodolites against ]
    [6625, 1345, F, 15961.57, 1993-08-16, 2-HIGH, Clerk#000000228, 0, rious theodolites against ]
    [6626, 575, F, 104789.96, 1994-05-07, 1-URGENT, Clerk#000000213, 0, ans. carefully even dolphins ]
    [6626, 575, F, 104789.96, 1994-05-07, 1-URGENT, Clerk#000000213, 0, ans. carefully even dolphins ]
    [6627, 967, P, 86219.59, 1995-03-26, 5-LOW, Clerk#000000603, 0,  the instructions. blithely eve]
    [6627, 967, P, 86219.59, 1995-03-26, 5-LOW, Clerk#000000603, 0,  the instructions. blithely eve]
    [6628, 448, F, 137905.54, 1994-04-28, 3-MEDIUM, Clerk#000000644, 0, uests detect. blithel]
    [6628, 448, F, 137905.54, 1994-04-28, 3-MEDIUM, Clerk#000000644, 0, uests detect. blithel]
    [6629, 1094, F, 198981.29, 1994-01-26, 1-URGENT, Clerk#000000153, 0, s deposits. blithely special Tiresias thrash ]
    [6629, 1094, F, 198981.29, 1994-01-26, 1-URGENT, Clerk#000000153, 0, s deposits. blithely special Tiresias thrash ]
    [6630, 800, O, 324687.23, 1996-12-05, 2-HIGH, Clerk#000000006, 0, carefully even asymptotes. furiously bold platelets are along]
    [6630, 800, O, 324687.23, 1996-12-05, 2-HIGH, Clerk#000000006, 0, carefully even asymptotes. furiously bold platelets are along]
    [6631, 481, O, 214850.36, 1995-09-06, 3-MEDIUM, Clerk#000000819, 0, ithely. even frets around the regula]
    [6631, 481, O, 214850.36, 1995-09-06, 3-MEDIUM, Clerk#000000819, 0, ithely. even frets around the regula]
    [6656, 754, F, 229247.93, 1994-05-16, 1-URGENT, Clerk#000000497, 0,  multipliers grow furiously. blithely express courts integr]
    [6656, 754, F, 229247.93, 1994-05-16, 1-URGENT, Clerk#000000497, 0,  multipliers grow furiously. blithely express courts integr]
    [6657, 463, F, 41401.44, 1992-01-03, 4-NOT SPECIFIED, Clerk#000000820, 0, ously among the blithely pending dependencies. blithely pending theo]
    [6657, 463, F, 41401.44, 1992-01-03, 4-NOT SPECIFIED, Clerk#000000820, 0, ously among the blithely pending dependencies. blithely pending theo]
    [6658, 1486, O, 42751.67, 1996-06-29, 5-LOW, Clerk#000000453, 0, uests sleep furiously regular pinto beans. carefully even foxes boost fur]
    [6658, 1486, O, 42751.67, 1996-06-29, 5-LOW, Clerk#000000453, 0, uests sleep furiously regular pinto beans. carefully even foxes boost fur]
    [6659, 164, O, 78331.25, 1998-01-12, 4-NOT SPECIFIED, Clerk#000000467, 0, e. carefully unusual deposits use s]
    [6659, 164, O, 78331.25, 1998-01-12, 4-NOT SPECIFIED, Clerk#000000467, 0, e. carefully unusual deposits use s]
    [6660, 1477, O, 126082.62, 1998-05-06, 5-LOW, Clerk#000000336, 0, ckly regular packages. sometimes ironic platelets are at the care]
    [6660, 1477, O, 126082.62, 1998-05-06, 5-LOW, Clerk#000000336, 0, ckly regular packages. sometimes ironic platelets are at the care]
    [6661, 31, F, 60572.38, 1993-04-17, 5-LOW, Clerk#000000669, 0, e regularly ironic packages. pending, regular instructions haggle]
    [6661, 31, F, 60572.38, 1993-04-17, 5-LOW, Clerk#000000669, 0, e regularly ironic packages. pending, regular instructions haggle]
    [6662, 214, O, 136940.91, 1995-11-23, 4-NOT SPECIFIED, Clerk#000000319, 0, nts nag furiously fluffily ]
    [6662, 214, O, 136940.91, 1995-11-23, 4-NOT SPECIFIED, Clerk#000000319, 0, nts nag furiously fluffily ]
    [6663, 577, F, 129944.6, 1995-02-03, 2-HIGH, Clerk#000000141, 0, posits wake slyly slyly i]
    [6663, 577, F, 129944.6, 1995-02-03, 2-HIGH, Clerk#000000141, 0, posits wake slyly slyly i]
    [6688, 1187, O, 123509.63, 1997-02-24, 1-URGENT, Clerk#000000987, 0, ly after the fluffily regular accounts. blithely ironic deposits along the ]
    [6688, 1187, O, 123509.63, 1997-02-24, 1-URGENT, Clerk#000000987, 0, ly after the fluffily regular accounts. blithely ironic deposits along the ]
    [6689, 757, O, 62749.57, 1997-06-02, 1-URGENT, Clerk#000000571, 0,  ideas wake fluffily along the unusual, si]
    [6689, 757, O, 62749.57, 1997-06-02, 1-URGENT, Clerk#000000571, 0,  ideas wake fluffily along the unusual, si]
    [6690, 169, O, 194895.17, 1996-06-26, 2-HIGH, Clerk#000000648, 0, eans sleep fluffily about the even requests. carefully unusual foxes eat]
    [6690, 169, O, 194895.17, 1996-06-26, 2-HIGH, Clerk#000000648, 0, eans sleep fluffily about the even requests. carefully unusual foxes eat]
    [6691, 1435, F, 252217.93, 1994-06-10, 5-LOW, Clerk#000000191, 0, d grouches haggle fluffily around the furiou]
    [6691, 1435, F, 252217.93, 1994-06-10, 5-LOW, Clerk#000000191, 0, d grouches haggle fluffily around the furiou]
    [6692, 248, O, 281826.95, 1996-09-23, 3-MEDIUM, Clerk#000000573, 0, pendencies wake carefully final braids. ironic packages are en]
    [6692, 248, O, 281826.95, 1996-09-23, 3-MEDIUM, Clerk#000000573, 0, pendencies wake carefully final braids. ironic packages are en]
    [6693, 73, F, 257921.91, 1992-05-25, 1-URGENT, Clerk#000000232, 0, lithely unusual orbits. furiously pending pinto beans use quickl]
    [6693, 73, F, 257921.91, 1992-05-25, 1-URGENT, Clerk#000000232, 0, lithely unusual orbits. furiously pending pinto beans use quickl]
    [6694, 604, F, 94148.46, 1992-09-01, 2-HIGH, Clerk#000000954, 0, platelets haggle carefully. regular instructions snooze carefully after the e]
    [6694, 604, F, 94148.46, 1992-09-01, 2-HIGH, Clerk#000000954, 0, platelets haggle carefully. regular instructions snooze carefully after the e]
    [6695, 170, F, 305558.38, 1992-06-28, 5-LOW, Clerk#000000877, 0, oost slyly. express pinto beans along the carefully iron]
    [6695, 170, F, 305558.38, 1992-06-28, 5-LOW, Clerk#000000877, 0, oost slyly. express pinto beans along the carefully iron]
    [6720, 1021, F, 229026.45, 1993-09-10, 4-NOT SPECIFIED, Clerk#000000753, 0,  instructions affix. final ideas nag slyly furiously ironic accounts. depos]
    [6720, 1021, F, 229026.45, 1993-09-10, 4-NOT SPECIFIED, Clerk#000000753, 0,  instructions affix. final ideas nag slyly furiously ironic accounts. depos]
    [6721, 994, O, 141550.73, 1996-11-29, 4-NOT SPECIFIED, Clerk#000000272, 0, ove the blithely final ]
    [6721, 994, O, 141550.73, 1996-11-29, 4-NOT SPECIFIED, Clerk#000000272, 0, ove the blithely final ]
    [6722, 514, O, 306885.73, 1996-06-26, 2-HIGH, Clerk#000000323, 0, lly express courts wake sl]
    [6722, 514, O, 306885.73, 1996-06-26, 2-HIGH, Clerk#000000323, 0, lly express courts wake sl]
    [6723, 343, O, 260879.33, 1998-06-02, 1-URGENT, Clerk#000000984, 0, lyly carefully final deposits. slyly regular plate]
    [6723, 343, O, 260879.33, 1998-06-02, 1-URGENT, Clerk#000000984, 0, lyly carefully final deposits. slyly regular plate]
    [6724, 934, O, 112167.53, 1995-10-25, 3-MEDIUM, Clerk#000000009, 0, o beans. furiously bold asymptotes wake slyly alongside of the slyly final ex]
@mknegi
Copy link
Author

mknegi commented Apr 25, 2024

Tried to run the query manually in single coordinator + multiple workers environment but it didn't reproduce there but the problem reproduces in the test environment every time.

@karteekmurthys
Copy link
Contributor

I am able to repro the isuse locally:

presto:tpch> SELECT count(*) FROM orders o WHERE EXISTS(SELECT 1 FROM orders i WHERE o.orderkey < i.orderkey AND i.orderkey % 1000 = 0);
 _col0 
-------
 56994  <--- expected: 14999
(1 row)
presto:tpch> explain SELECT count(*) FROM orders o WHERE EXISTS(SELECT 1 FROM orders i WHERE o.orderkey < i.orderkey AND i.orderkey % 1000 = 0);
                                                                                                                                                                                      Query Plan                                           >
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 - Output[PlanNodeId 23][_col0] => [count:bigint]                                                                                                                                                                                          >
         _col0 := count (1:16)                                                                                                                                                                                                             >
     - Aggregate(FINAL)[PlanNodeId 18] => [count:bigint]                                                                                                                                                                                   >
             count := "presto.default.count"((count_22)) (1:16)                                                                                                                                                                            >
         - LocalExchange[PlanNodeId 819][SINGLE] () => [count_22:bigint]                                                                                                                                                                   >
             - RemoteStreamingExchange[PlanNodeId 825][GATHER] => [count_22:bigint]                                                                                                                                                        >
                 - Aggregate(PARTIAL)[PlanNodeId 823] => [count_22:bigint]                                                                                                                                                                 >
                         count_22 := "presto.default.count"(*) (1:16)                                                                                                                                                                      >
                     - FilterProject[PlanNodeId 384,264][filterPredicate = (count_21) > (BIGINT'0'), projectLocality = LOCAL] => []                                                                                                        >
                             Estimates: {source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: ?, network: 11.00}/{source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: ?, network: 11.00}                                        >
                         - Aggregate(STREAMING)[orderkey, unique][PlanNodeId 263] => [orderkey:bigint, unique:bigint, count_21:bigint]                                                                                                     >
                                 Estimates: {source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: ?, network: 11.00}                                                                                                                  >
                                 count_21 := "presto.default.count"((non_null))                                                                                                                                                            >
                             - LeftJoin[PlanNodeId 262][(orderkey) < (orderkey_0)] => [orderkey:bigint, unique:bigint, non_null:boolean]                                                                                                   >
                                     Estimates: {source: CostBasedSourceInfo, rows: ? (?), cpu: ?, memory: 11.00, network: 11.00}                                                                                                          >
                                     Distribution: REPLICATED                                                                                                                                                                              >
                                     SortExpression[orderkey_0]                                                                                                                                                                            >
                                 - AssignUniqueId[PlanNodeId 261] => [orderkey:bigint, unique:bigint]                                                                                                                                      >
                                         Estimates: {source: CostBasedSourceInfo, rows: 15,000 (131.84kB), cpu: 270,000.00, memory: 0.00, network: 0.00}                                                                                   >
                                     - TableScan[PlanNodeId 0][TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Optional[tpch.orders{}>
                                             Estimates: {source: CostBasedSourceInfo, rows: 15,000 (131.84kB), cpu: 135,000.00, memory: 0.00, network: 0.00}                                                                               >
                                             LAYOUT: tpch.orders{}                                                                                                                                                                         >
                                             orderkey := orderkey:bigint:0:REGULAR (1:30)                                                                                                                                                  >
                                 - LocalExchange[PlanNodeId 798][SINGLE] () => [orderkey_0:bigint, non_null:boolean]                                                                                                                       >
                                         Estimates: {source: CostBasedSourceInfo, rows: 1 (9B), cpu: 20.00, memory: 0.00, network: 11.00}                                                                                                  >
                                     - RemoteStreamingExchange[PlanNodeId 740][REPLICATE] => [orderkey_0:bigint, non_null:boolean]                                                                                                         >
                                             Estimates: {source: CostBasedSourceInfo, rows: 1 (9B), cpu: 20.00, memory: 0.00, network: 11.00}                                                                                              >
                                         - ScanProject[PlanNodeId 1,260][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitionValues=Optional.empty}', layout='Opti>
                                                 Estimates: {source: CostBasedSourceInfo, rows: 1 (9B), cpu: 9.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1 (9B), cpu: 20.00, memory: 0.00, network: 0.00}       >
                                                 non_null := BOOLEAN'true'                                                                                                                                                                 >
                                                 LAYOUT: tpch.orders{filter=((orderkey) % (BIGINT'1000')) = (BIGINT'0')}                                                                                                                   >
                                                 orderkey_0 := orderkey:bigint:0:REGULAR (1:66)                                                                                                                                            >
                                                                                                                                                                                                                                           >
(1 row)

@karteekmurthys
Copy link
Contributor

Just an update:

I am able to reproduce the issue . The Velox StreamingAggregation operator seems to be flaky in a multi-threaded setting. When I reduced the the number of drivers to one per task, the issue goes away. Here is the number of rows output expected from StreamingAgg operator:
Screenshot 2024-04-29 at 4 21 29 PM

Here is what we actually get in Prestissimo:

Screenshot 2024-04-30 at 10 05 24 AM

@mknegi
Copy link
Author

mknegi commented May 6, 2024

Just an update:

I am able to reproduce the issue . The Velox StreamingAggregation operator seems to be flaky in a multi-threaded setting. When I reduced the the number of drivers to one per task, the issue goes away. Here is the number of rows output expected from StreamingAgg operator: Screenshot 2024-04-29 at 4 21 29 PM

Here is what we actually get in Prestissimo:
Screenshot 2024-04-30 at 10 05 24 AM

This is what I noticed too. The query was failing only in the test environment where we use multiple workers.

@tdcmeehan
Copy link
Contributor

@karteekmurthys @mknegi @aditi-pandit seems like we should create an issue in Velox?

@karteekmurthys
Copy link
Contributor

@tdcmeehan I agree with that. I need to come up with a minimal repro so I can open this issue in Velox.

@aditi-pandit
Copy link
Contributor

Just an update:
I am able to reproduce the issue . The Velox StreamingAggregation operator seems to be flaky in a multi-threaded setting. When I reduced the the number of drivers to one per task, the issue goes away. Here is the number of rows output expected from StreamingAgg operator: Screenshot 2024-04-29 at 4 21 29 PM
Here is what we actually get in Prestissimo:
Screenshot 2024-04-30 at 10 05 24 AM

This is what I noticed too. The query was failing only in the test environment where we use multiple workers.

@mknegi : These are 2 separate observations. Karteek is saying multi-drivers per node. Whereas your case has multiple workers in the cluster. Can you try multiple workers with one/multiple drivers per task ? If the issue is only in the multiple worker/multiple driver case, then we have a cleaner isolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unprioritized
Status: Backlog
Development

No branches or pull requests

4 participants