-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi index leaf requests #4962
Conversation
1f69f3b
to
156620e
Compare
965bd78
to
afb99b2
Compare
PR quickwit-oss/quickwit#4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.
let leaf_responses = try_join_all(leaf_request_tasks).await?; | ||
let merge_collector = make_merge_collector(&search_request, &aggregation_limits)?; | ||
let mut incremental_merge_collector = IncrementalCollector::new(merge_collector); | ||
for result in leaf_responses { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we merge as they come here with a FuturesUnordered
instead of using try_join_all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I planned this for later when moving the lower loop of splits per index to the top level
for result in leaf_responses { | ||
match result { | ||
Ok(result) => { | ||
incremental_merge_collector.add_result(result)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah add_result is light anyway... maybe it does not matter that much as long as we don't have any lazy optimization.
@@ -976,6 +1080,8 @@ async fn leaf_search_single_split_wrapper( | |||
}), | |||
} | |||
if let Some(last_hit) = locked_incremental_merge_collector.peek_worst_hit() { | |||
// TODO: we could use a RWLock instead and read the value instead of updateing it | |||
// unconditionally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter? we only update it once per split don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw some contention on incremental_merge_collector
when pushing results, which is also once per split.
It's more an issue when we start to share the threshold between indices search (which we don't currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see inline comments
PR quickwit-oss/quickwit#4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.
afb99b2
to
0c1b96b
Compare
execute one request per node, instead one request per index. A leaf can receive now request over multiple indices. Especially in cases with search requests on many indices this saves a lot of requests and therefore memory. It also allows better control of the memory consumption of a requests on a node, if it is not split up over multiple requests.
LeafSearchResponse includes serialized aggregations to send them between nodes. This is used also on a leaf, which causes the results to be serialized and then deserialized for merging again _per_ split. This PR introduces IntermediateLeafResult to be used instead. It also adds a field `aggregation_type` to LeafSearchResponse to be able to convert self contained between IntermediateLeafResult and LeafSearchResponse.
This reverts commit c968e4d.
0c1b96b
to
28bf15a
Compare
execute one leaf search request per node, instead one leaf search request per index.
A leaf search can span now multiple indices.
Especially in cases with search requests on many indices this may save a
lot of request overhead to other nodes. It also allows better control of
the aggregation memory limit of a request to a node, if it is not split up
over multiple requests, as well as finer control over concurrency.
The new
LeafSearchRequest
contains deduplication, so that if a request spans 1000 indices with the same doc_mapper, we send the doc_mapper only once.TODO:
This version has no special optimizations for multi index early exit.
e.g. CanSplitDoBetter should be altered to handle multi indices.