Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: push any filter down to storage #794

Open
skyzh opened this issue Aug 11, 2023 · 3 comments
Open

optimizer: push any filter down to storage #794

skyzh opened this issue Aug 11, 2023 · 3 comments

Comments

@skyzh
Copy link
Member

skyzh commented Aug 11, 2023

The current storage iterator supports pushing down any filter. We may add an optimizer rule for that (I myself is working on it...)

@wangrunji0408
Copy link
Member

wangrunji0408 commented Aug 11, 2023

What's the difference between filtering arbitrary expression on the executor and filtering on the storage? Is that really helpful to push it down? I can only understand that range filters on primary key are more efficient. That's why I didn't add rules to push any filter down. 🤪

@skyzh
Copy link
Member Author

skyzh commented Aug 11, 2023

Filters with high selectivity can also be super efficient if it's pushed down to the storage side. Consider two examples:

  1. The current storage engine supports conditional read. That is to say, if we have something like select * from t where a > 10 and b < 10 even if they are not pks, we will first read a block of column a and column b, and if all data in that block does not match the condition, we will not read the remaining columns and will move the cursor.
  2. Storage (may) support filtering by metadata, whereas we store min/max for each of the 1MB storage block. Therefore, if we have a high selectivity filter like b=10 on any column, we should be able to filter a lot of things.

@wangrunji0408
Copy link
Member

wangrunji0408 commented Aug 12, 2023

I see. Thanks for the explanation. Then we should revert some changes from #786, where I removed the code for filtering in RowsetIterator. 🤪

github-merge-queue bot pushed a commit that referenced this issue Apr 12, 2024
related to #794 #834 , add some new rules and change scan's default
filter to true(this make it match with the "filter-scan" rule)

Signed-off-by: kysshsy <kysshsy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants