Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of correlated NOT EXISTS queries #21859

Open
sopel39 opened this issue May 8, 2024 · 0 comments
Open

Improve performance of correlated NOT EXISTS queries #21859

sopel39 opened this issue May 8, 2024 · 0 comments

Comments

@sopel39
Copy link
Member

sopel39 commented May 8, 2024

Correlated NOT EXISTS queries like

SELECT * FROM table_a a WHERE NOT EXISTS (SELECT * FROM table_b b WHERE a.x = b.y AND a.i < b.j)

are rewritten to:

Filter(not exists)
|
Projection(exists := COALESCE(subquery_or, false))
|
Aggregation([a.*], subquery_or := boolean_or(subquery_true))
|
LeftJoin[a.x = b.y AND a.i < b.j]
|
---- Probe
|
---- Project[b.*, subquery_true := TRUE] -- Build

The LeftJoin in the plan above will produce row for every match. However, only single match is required to determine that exists == true for particular probe row.

To improve this we could add a new JoinNode flag, e.g: boolean singleMatch which would stop enumerating matches after a single hit.

Perhaps we could refactor or remove JoinNode#maySkipOutputDuplicates, which doesn't help here because join operator cannot stop enumerating matches when maySkipOutputDuplicates==true since subquery_true is not part of equi-clauses (see

// Implementation of hash join operator may only take advantage of output duplicates insensitive joins when:
)

Alternatively, we could determine that subquery_true is constant expression, which would allow us to skip output duplicates when maySkipOutputDuplicates==true. That might require having "traits" as part of Reference (cc @martint )

cc @Dith3r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant