Federated JOINs with LIMITs are significantly slower than their single-type counterparts #999
Labels
bug
Something isn't working
pg_lakehouse
Issue related to `pg_lakehouse/`
priority-2-medium
Medium priority issue
Bug Description
Look at this comment: #918 (review)
How To Reproduce
On large data, perform a large JOIN followed by a LIMIT. Heap-only and parquet-only will complete in a reasonable time, but heap-parquet will take longer.
Proposed Fix
My suspicion is that there is some optimization happening with regards to the LIMIT itself that prevents further evaluation once the LIMIT is reached. With federated joins, the individual query from each table type is fully evaluated before the JOIN and LIMIT are performed. There should be some optimization done with regards to the LIMIT itself, or (perhaps more likely) the conversion of the heap results into a RecordBatchStream.
The text was updated successfully, but these errors were encountered: