Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federated JOINs with LIMITs are significantly slower than their single-type counterparts #999

Open
sardination opened this issue Mar 20, 2024 · 0 comments
Labels
bug Something isn't working pg_lakehouse Issue related to `pg_lakehouse/` priority-2-medium Medium priority issue

Comments

@sardination
Copy link
Contributor

Bug Description
Look at this comment: #918 (review)

How To Reproduce
On large data, perform a large JOIN followed by a LIMIT. Heap-only and parquet-only will complete in a reasonable time, but heap-parquet will take longer.

Proposed Fix
My suspicion is that there is some optimization happening with regards to the LIMIT itself that prevents further evaluation once the LIMIT is reached. With federated joins, the individual query from each table type is fully evaluated before the JOIN and LIMIT are performed. There should be some optimization done with regards to the LIMIT itself, or (perhaps more likely) the conversion of the heap results into a RecordBatchStream.

@philippemnoel philippemnoel added the bug Something isn't working label Mar 21, 2024
@philippemnoel philippemnoel added pg_lakehouse Issue related to `pg_lakehouse/` and removed pg_analytics labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pg_lakehouse Issue related to `pg_lakehouse/` priority-2-medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants