Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HBO support for CTE materialization #22606

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

feilong-liu
Copy link
Contributor

Description

The three nodes, Sequence node, cte producer node, and cte consumer node are not handled in Canonicalization for HBO. It will make HBO not working for the query. This PR adds plan canonicalization for it.

Motivation and Context

To have HBO work for CTE materialized query

Impact

HBO work for CTE materialized query

Test Plan

Unit test

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Enable HBO for CTE materialized query

public Optional<PlanNode> visitSequence(SequenceNode node, Context context)
{
node.getCteProducers().forEach(x -> x.accept(this, context));
return node.getPrimarySource().accept(this, context);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sequence node, just return the canonicalized primary source

@Override
public Optional<PlanNode> visitCteProducer(CteProducerNode node, Context context)
{
return node.getSource().accept(this, context);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cte producer, canonicalize its source

@Override
public Optional<PlanNode> visitCteConsumer(CteConsumerNode node, Context context)
{
return node.getOriginalSource().accept(this, context);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonicalize original source for cte consumer

@Override
public Optional<PlanNode> visitSequence(SequenceNode node, Context context)
{
node.getCteProducers().forEach(x -> x.accept(this, context));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to visit producers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to trigger canonicalization of plan nodes under the cte producer, which will also need HBO to optimize the subquery.

Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just note that the stats equivalent node of the cteProducer and cteConsumer is the actual source too, I think this stats equivalent node is also passed to the table writers and table scan nodes in the physical optimizer and so tracking should also work but the flow is not entirely clear.

@@ -830,6 +833,25 @@ public Optional<PlanNode> visitAggregation(AggregationNode node, Context context
return Optional.of(canonicalPlan);
}

@Override
public Optional<PlanNode> visitSequence(SequenceNode node, Context context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case where the sequence node is also canonicalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end to end test in testHistoryBasedStatsCalculatorCTE should suffice to make sure that it works.

.setSystemProperty(CTE_PARTITIONING_PROVIDER_CATALOG, "hive")
.build();
// CBO Statistics
assertPlan(cteMaterialization, sql, anyTree(node(ProjectNode.class, anyTree(any())).withOutputRowCount(Double.NaN)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feilong-liu will this also work if the cte materialization was off in the first execution and then on in the second?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the opposite case, if cte materialization was on for the first run and then off again. I am a little confused and not sure it will work as expected in all these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants