Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6363] Introduce a rule to derive more filters from inner join … #3760

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

frostruan
Copy link

Copy link

sonarcloud bot commented Apr 13, 2024

Copy link
Contributor

@jamesstarr jamesstarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more complete solution that handled things like using predicate pull ups and handling the null producing joins. If a partial solution is accepted, then it might discourage a more complete solution down the line due to the legacy rule providing more base for expanding the solution.

final RexNode newCondition =
deriveEquivalenceCondition(simplify, rexBuilder, originalCondition);

if (arePredicatesEquivalent(rexBuilder, simplify, originalCondition, newCondition)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not sufficient to prevent infinite loops once generalized to pull up predicates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind give me an example ? Can't imagine situation where an infinite loop would occur now. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here

super(config);
}

@Override public void onMatch(RelOptRuleCall call) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the RelMetaDataQuery for extracting filters from below. Please look at RelMetadataQuery::getPulledUpPredicates usage in ReduceExpressionsRule.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I'll check and learn.

return;
}

final Filter newFilter = filter.copy(filter.getTraitSet(), filter.getInput(), newCondition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it preferable to generate:

FILTER
  JOIN
    FILTER
      ....
    FILTER
      ....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a new Filter is generated, FilterJoinRule will push down the predicate.

In fact, I agree with you and the current implementation is a compromise. In terms of the order of rule application, the rules in calcite are order-insensitive, which allows the rules to remain completely independent without any dependencies. But if the predicate is pushed down in this rule, then I think this rule will do too many things. I want to keep the rule as simple and atomic as possible, focusing on one thing, so Just generating a new Filter ends here.

If in the future, we can allow rules to declare their own application order (such as applying before/after other rules), and then perform topological sorting on the rules, then it may be enough to just generate a new Filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a better extraction method, then we would see an infinite loop. This is my concern with such a partial solution, it will be difficult to build on top of.

SELECT * FROM t1, t2
WHERE t1.c1 = t2.c1
  AND ((t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10))
    OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8))

Given the above example ideally you would get something like the following tree:

  FILTER (t1.c1, t2.c1) IN ((1, 2), (3, 4), (5, 6), (7, 8), (9, 10)) OR (t1.c1, t1.c2) IN ((3, 4), (5, 6), (7, 8))
    JOIN on t1.c1 = t2.c1
      FILTER t1.c1 IN (1, 3, 5, 7, 9)
        SCAN t1
      SCAN t2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I see here reminds me of HIVE-25758 and similar bugs around JoinPushTransitivePredicatesRule, for which you can find more details in this slide deck from a Calcite meetup from last year: https://www.slideshare.net/slideshow/debugging-planning-issues-using-calcites-builtin-loggers/256567632 (slides 45-53).

As @jamesstarr suggests, once you can pull up more predicates, the burden of converging falls onto RexSimplify, if it fails at simplifying the "redundant" part, the predicate will always be identified as a new predicate that you will push from one side of the join to the other, pushed down and merged with existing predicates, which will again be identified as new predicates and you go into a loop.

I agree that having extra power here could be dangerous, but as long as it's not part of the core rules and it stays optional, it would be good to have it IMO.

import java.util.stream.Collectors;

/**
* Planner rule that derives more equivalent predicates from inner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can transitively generate predicates for the null generating sides or left and right joins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let me try this.

call.transformTo(newFilter);

// after derivation, the original filter can be pruned
call.getPlanner().prune(filter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we do not want to do this. If their is a large tree, then this could cause problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this might be too radical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this break anything if removed? Why would it be to radical?

*/

@Value.Enclosing
public class JoinDeriveEquivalenceFilterRule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading since it is not just equivalency, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is indeed a bit misleading. I originally wanted to express the derivation of rules based on equivalent rewriting. I will rename it.

* representation lexicographic order, which allows constant to always be on the
* right side of the expression. See {@link RexNormalize#reorderOperands} for details.
*/
public static RexNode canonizeNode(RexBuilder rexBuilder, RexNode expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it preferable to have helper functions for interacting with the existing nodes then creating a copy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll fix this.

@frostruan
Copy link
Author

REALLY appreciate your review. @jamesstarr It's very helpful. I'll address these as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants