optimizer: minimal rule of range scan #645

skyzh · 2022-05-07T02:33:29Z

After #644 gets implemented, our storage layer would have range scan capability!

To make SQL leverage this property, we can have a range filter scan rule, which does the following:

It runs after filter scan rule.
It finds a TableScan node with filter.
If the filter condition is in the form of cond1 and cond2 and cond3, ..., and there is one condition in the form of pk <op> constant, then we change the TableScan to scan with pk sorted, and apply the begin / end sort key to the scan interface.

We can eventually move forward to #601. But to keep everything simple, we can have this minimal rule.

The text was updated successfully, but these errors were encountered:

skyzh · 2022-05-12T10:22:19Z

@shmiwy will handle this 🤣

yyin-dev · 2022-09-28T12:44:04Z

This looks like a reasonable exercise for an optimizer newbie... If no one is working on it right now, I can take a look?

yyin-dev · 2022-09-28T15:21:19Z

I understand what a range scan is: specifying the start and end point of a scan, and (hopefully) uses an index to avoid a full table scan.

However, I am not sure what's a filter scan. Reading FilterScanRule, it seems to convert a LogicalFilter to a LogicalTableScan, which I don't see the reason. Can you give some explanation on the filter scan rule, or provide pointers to reference I can read? (I Googled "SQL filter scan" but did not find anything very relevant). Thanks.

skyzh · 2022-09-28T16:21:11Z

Filters can be pushed down to the storage layer. We can put the predicate inside TableScanExecutor, so that the storage layer can filter more blocks instead of fetching them to the compute layer.

FilterScanRule merges filter + table scan to a table scan with filter.

yyin-dev · 2022-09-28T22:06:24Z

Here's my understanding of the problem we are trying to solve, and a few questions:

High level idea: we want to build a "range scan rule" such that, for a table scan with predicates on pks, we use the range scan ability provided by the storage layer to redece the # of blocks we read into memory (and thus improve performance). Sample query: SELECT * FROM table WHERE pk > 42 AND non_pk < 42.
The logical plan for the above sample query will be a LogicalFilter, and the existing FilterScanRule will converts it into a LogicalTableScan with conditions pk > 42 AND non_pk < 42. However, this requires fetching all pages into memory to perform the filtering. The "range scan rule" (that we are going to implement) will see that we have a LogicalTableScan with conditions on the primary key, and rewrite it to a LogicalTableScan on sorted pks (pk > 42) and remaining conditions (non_pk < 42).

My questions:

Is above understanding correct?
According to what I described, it seems that both the existing "filter scan rule" and the new "range scan rule" should be performed by the optimizer. However, the current HeuristicOptimizer stops once it successfully applies a rule. So for the sample query, the optimizer stops after applying FilterScanRule.

Does that mean we won't create a new RangeScanRule, but will just add more logic to FilterScanRule?

impl HeuristicOptimizer {
    pub fn optimize(&self, mut root: PlanRef) -> PlanRef {
        for rule in &self.rules {
            if let Ok(applied) = rule.apply(root.clone()) {
                root = applied;
                break;
            }
        }
    ...

To provide begin_sort_key and end_sort_key arguments to Transaction::scan, we need to add those fields to LogicalTableScan, right?

skyzh · 2022-09-28T23:18:42Z

Is above understanding correct?

Yes. Note that non-pk can also be pushed down to the storage layer. Our column store supports filter scan (on any column) and range filter scan on pk.

According to what I described, it seems that both the existing "filter scan rule" and the new "range scan rule" should be performed by the optimizer. However, the current HeuristicOptimizer stops once it successfully applies a rule. So for the sample query, the optimizer stops after applying FilterScanRule.

Yes, you will need to modify the rule. It would be okay to create a new rule, or modify the existing FilterScanRule. Both will work.

To provide begin_sort_key and end_sort_key arguments to Transaction::scan, we need to add those fields to LogicalTableScan, right?

Yes.

yyin-dev · 2022-09-28T23:38:47Z

Thanks for replying!

According to what I described, it seems that both the existing "filter scan rule" and the new "range scan rule" should be performed by the optimizer. However, the current HeuristicOptimizer stops once it successfully applies a rule. So for the sample query, the optimizer stops after applying FilterScanRule.

Yes, you will need to modify the rule. It would be okay to create a new rule, or modify the existing FilterScanRule. Both will work.

Suppose we create a new rule, call it RangeScanRule. If we put it after FilterScanRule, it won't be applied, because the heuristic optimizer stops once it applies FilterScanRule; If we put it before FilterScanRule, the optimizer applies it instead of the FilterScanRule, but there will be some duplicate code between the two rules. So I guess the best way is to extend the existing FilterScanRule (and probably give it a new name like PredicatePushdown)?

impl Optimizer {
    pub fn optimize(&mut self, mut plan: PlanRef) -> PlanRef {
        ...
        let mut rules: Vec<Box<(dyn rules::Rule + 'static)>> = vec![
            Box::new(FilterAggRule {}),
            Box::new(FilterJoinRule {}),
            Box::new(LimitOrderRule {}),
        ];
        if self.enable_filter_scan {
            rules.push(Box::new(FilterScanRule {}));
        }
        let hep_optimizer = HeuristicOptimizer { rules };
        plan = hep_optimizer.optimize(plan);
    ...
}

impl HeuristicOptimizer {
    pub fn optimize(&self, mut root: PlanRef) -> PlanRef {
        for rule in &self.rules {
            if let Ok(applied) = rule.apply(root.clone()) {
                root = applied;
                break;
            }
        }

skyzh · 2022-09-28T23:56:47Z

heuristic optimizer stops once it applies FilterScanRule

No. The optimizer ~~will~~ should apply ALL rules one by one in the list.

skyzh · 2022-09-28T23:58:06Z

Seems like a bug. Need fix 🤣

yyin-dev · 2022-09-29T00:02:01Z

Seems like a bug. Need fix 🤣

The optimizer should iterate through all rules, and try all of them sequentially, right? If that's the case, it should be an easy fix (and if we have enough tests for the optimizer, it should be relatively easy to verify that this fix won't break anything)?

Just verified that if we remove the break statement (and the optimizer will try apply all rules), all unit tests still pass.

skyzh · 2022-09-29T00:57:54Z

Yes. Welcome to submit a PR. Also cc @st1page 🤣

st1page · 2022-09-29T03:46:07Z

remove the break statement

+1

we put it after FilterScanRule, it won't be applied, because the heuristic optimizer stops once it applies FilterScanRule; If we put it before FilterScanRule, the optimizer applies it instead of the FilterScanRule, but there will be some duplicate code between the two rules.

BTW, if the order of the rules can affect the optimizing result, we can split them into multi optimize phases. In other words, construct multiple HeuristicOptimizer struct with different rule set.

yyin-dev · 2022-10-24T18:15:18Z

I am finally picking up this again (after a long time)...

Currently, Transaction::scan looks like this:

fn scan<'a>(
        &'a self,
        begin_sort_key: &'a [DataValue],
        end_sort_key: &'a [DataValue],
        col_idx: &'a [StorageColumnRef],
        is_sorted: bool,
        reversed: bool,
        expr: Option<BoundExpr>,
    ) -> Self::ScanResultFuture<'a> {

Since begin_sort_key and end_sort_key are & [DataValue], there's no way to tell which columns they are referring to. Should scan takes another argument, e.g. sort_key_cols of type & [StorageColumnRef] or [int], to pass this information?

The implementation of range scan currently supports only one key, and always uses the first column: https://github.com/risinglightdb/risinglight/pull/644/files#r867324032 (which seems wrong).

skyzh mentioned this issue May 7, 2022

optimizer: add property sorted to plan nodes #601

Open

xxchan assigned yyin-dev Sep 28, 2022

yyin-dev mentioned this issue Sep 30, 2022

bug: compaction panic #696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizer: minimal rule of range scan #645

optimizer: minimal rule of range scan #645

skyzh commented May 7, 2022

skyzh commented May 12, 2022

yyin-dev commented Sep 28, 2022

yyin-dev commented Sep 28, 2022

skyzh commented Sep 28, 2022

yyin-dev commented Sep 28, 2022

skyzh commented Sep 28, 2022

yyin-dev commented Sep 28, 2022 •

edited

skyzh commented Sep 28, 2022 •

edited

skyzh commented Sep 28, 2022

yyin-dev commented Sep 29, 2022 •

edited

skyzh commented Sep 29, 2022

st1page commented Sep 29, 2022

yyin-dev commented Oct 24, 2022

optimizer: minimal rule of range scan #645

optimizer: minimal rule of range scan #645

Comments

skyzh commented May 7, 2022

skyzh commented May 12, 2022

yyin-dev commented Sep 28, 2022

yyin-dev commented Sep 28, 2022

skyzh commented Sep 28, 2022

yyin-dev commented Sep 28, 2022

skyzh commented Sep 28, 2022

yyin-dev commented Sep 28, 2022 • edited

skyzh commented Sep 28, 2022 • edited

skyzh commented Sep 28, 2022

yyin-dev commented Sep 29, 2022 • edited

skyzh commented Sep 29, 2022

st1page commented Sep 29, 2022

yyin-dev commented Oct 24, 2022

yyin-dev commented Sep 28, 2022 •

edited

skyzh commented Sep 28, 2022 •

edited

yyin-dev commented Sep 29, 2022 •

edited