Parquetquery join performance improvement #3604
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This is a redo of the join/leftjoin iterator core loops. This new loop uses the first iterator as the "driving" iterator to which we try to find match amongst all the others. Instead of peeking the tip of them all, it starts the next pass as soon as iterator 2..N don't match iterator 1. There is also a dynamic re-sort of iterators at runtime. If iterators 2...N are able to filter further into the file than iterator 1, it is swapped to the top and becomes the new iterator 1.
This has a nice performance improvement on mixed traceql queries, and metrics queries. The 20% speed on mixed queries is interesting. To dig a bit: the new loop was responsible for ~15%, and the dynamic re-sort another 5%.
BenchmarkBackendBlockTraceQL
BenchmarkBackendBlockQueryRange
Which issue(s) this PR fixes:
Fixes n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]