Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] statistics error lead to OOM when multi-table join scenario #34656

Closed
3 tasks done
yx-keith opened this issue May 10, 2024 · 0 comments
Closed
3 tasks done

Comments

@yx-keith
Copy link

yx-keith commented May 10, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Description

In multi-table join scenario, The result of intermediate join will be used as the input of the subsequent joins.
During the interval between updating statistics, when we update data, statistics is not collected in time, If we run a multi-table join query within this time interval, the optimizer may choose a poor strategy because there is no correct statistics.

for example:
select * from example_tbl t1 join example_tbl02 t2 on t1.city=t2.city and t1.city="成都" join example_tbl03 t3 on t1.city=t3.city;

this is plan:

image

in this case, When the last statistical information was collected, there were no rows containing '成都', but before the next statistics collection, rows containning '成都' was inserted, wrong statistics lead to poor execution plan:
example_tbl02 will be broadcasted to other node to join, but the actual situation is example_tbl02 has many many rows contains '成都',which may cause OOM during broadcast distribution.

Solution

654ead9

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant