-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support vectorsets at shard level #2129
Support vectorsets at shard level #2129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark
Benchmark suite | Current: e185a99 | Previous: 6c53c37 | Ratio |
---|---|---|---|
nucliadb/search/tests/unit/search/test_fetch.py::test_highligh_error |
13200.175746941139 iter/sec (stddev: 3.1607053980994163e-7 ) |
13198.084460244272 iter/sec (stddev: 3.4157717375989134e-7 ) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
b2f0d50
to
9b86004
Compare
cdce481
to
1796ac9
Compare
let task = move || { | ||
run_with_telemetry(info_span!(parent: &span, "Add a vectorset"), move || { | ||
let shard = obtain_shard(shards, shard_id.clone())?; | ||
shard.create_vectors_index(NewVectorsIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about including the dimension in the create vectorset request. Are we not doing that for any particular reason?
After today's talk we would also need to eventually add more info like the datatype, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I probably missed the conversation. Anyway, changing this would imply changing the new shard request too. Do we want to mix it in this PR?
result | ||
}; | ||
let mut vector_tasks = vec![]; | ||
for (_, vector_writer) in indexes.vectors_indexes.iter_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this write the same vector to all vectorsets?
I guess it's still a placeholder until we change the SetResource message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe that's a good opportunity to clean protobuffers and pass a custom struct instead of the whole Resource
to nucliadb_vectors
merged: 0, | ||
left: 0, | ||
}); | ||
// TODO: return metrics by vectorset, not only the deafult one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only running a merge on the default index, unless I missed something. I'd change this TODO to indicate that not only it returns default metrics, but also that it's only merging the default index. Or even better, actually merge all indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. I didn't know if I wanted to change merge protos too
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2129 +/- ##
==========================================
- Coverage 75.02% 74.92% -0.11%
==========================================
Files 80 80
Lines 5866 5894 +28
==========================================
+ Hits 4401 4416 +15
- Misses 1465 1478 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
for (name, vectors_index) in indexes.vectors_indexes.iter() { | ||
let runner = vectors_index.prepare_merge(context.parameters); | ||
if let Ok(Some(mut runner)) = runner { | ||
let result = runner.run(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runner.run()
must be outside of any locks because it's the slow part and we don't want to block other operations in the index meanwhile.
So this needs to be 3 blocks:
{
indexes = read_rw_lock()
for each index { prepare_merge() }
}
for each index { runner.run() }
{
indexes = write_rw_lock()
for each index { record_merge() }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
New, open, set and remove resource, GC and reload
Co-authored-by: Javier Torres <javier@javiertorres.eu>
220d663
to
e185a99
Compare
Description
Describe the proposed changes made in this PR.
How was this PR tested?
Describe how you tested this PR.