Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vectorsets at shard level #2129

Merged

Conversation

jotare
Copy link
Contributor

@jotare jotare commented May 7, 2024

Description

Describe the proposed changes made in this PR.

How was this PR tested?

Describe how you tested this PR.

@jotare jotare requested a review from a team May 7, 2024 15:05
@jotare jotare marked this pull request as draft May 7, 2024 15:05
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Benchmark suite Current: e185a99 Previous: 6c53c37 Ratio
nucliadb/search/tests/unit/search/test_fetch.py::test_highligh_error 13200.175746941139 iter/sec (stddev: 3.1607053980994163e-7) 13198.084460244272 iter/sec (stddev: 3.4157717375989134e-7) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@jotare jotare force-pushed the joanantoniriera4168/sc-10087/support-vectorsets-at-shard-level branch from b2f0d50 to 9b86004 Compare May 8, 2024 16:47
@jotare jotare marked this pull request as ready for review May 9, 2024 10:22
@jotare jotare force-pushed the joanantoniriera4168/sc-10087/support-vectorsets-at-shard-level branch from cdce481 to 1796ac9 Compare May 9, 2024 11:07
let task = move || {
run_with_telemetry(info_span!(parent: &span, "Add a vectorset"), move || {
let shard = obtain_shard(shards, shard_id.clone())?;
shard.create_vectors_index(NewVectorsIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about including the dimension in the create vectorset request. Are we not doing that for any particular reason?

After today's talk we would also need to eventually add more info like the datatype, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I probably missed the conversation. Anyway, changing this would imply changing the new shard request too. Do we want to mix it in this PR?

nucliadb_node/src/shards/shard_writer.rs Outdated Show resolved Hide resolved
result
};
let mut vector_tasks = vec![];
for (_, vector_writer) in indexes.vectors_indexes.iter_mut() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this write the same vector to all vectorsets?

I guess it's still a placeholder until we change the SetResource message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe that's a good opportunity to clean protobuffers and pass a custom struct instead of the whole Resource to nucliadb_vectors

merged: 0,
left: 0,
});
// TODO: return metrics by vectorset, not only the deafult one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only running a merge on the default index, unless I missed something. I'd change this TODO to indicate that not only it returns default metrics, but also that it's only merging the default index. Or even better, actually merge all indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. I didn't know if I wanted to change merge protos too

Copy link

codecov bot commented May 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.92%. Comparing base (169a3e9) to head (e6107a1).
Report is 3 commits behind head on main.

❗ Current head e6107a1 differs from pull request most recent head e185a99. Consider uploading reports for the commit e185a99 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2129      +/-   ##
==========================================
- Coverage   75.02%   74.92%   -0.11%     
==========================================
  Files          80       80              
  Lines        5866     5894      +28     
==========================================
+ Hits         4401     4416      +15     
- Misses       1465     1478      +13     
Flag Coverage Δ
ingest 70.30% <ø> (-0.14%) ⬇️
utils 81.53% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jotare jotare requested a review from javitonino May 14, 2024 09:09
for (name, vectors_index) in indexes.vectors_indexes.iter() {
let runner = vectors_index.prepare_merge(context.parameters);
if let Ok(Some(mut runner)) = runner {
let result = runner.run();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runner.run() must be outside of any locks because it's the slow part and we don't want to block other operations in the index meanwhile.

So this needs to be 3 blocks:

{
  indexes = read_rw_lock()
  for each index { prepare_merge() }
}
for each index { runner.run() }
{
  indexes = write_rw_lock()
  for each index { record_merge()  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jotare jotare force-pushed the joanantoniriera4168/sc-10087/support-vectorsets-at-shard-level branch from 220d663 to e185a99 Compare May 14, 2024 10:33
@jotare jotare merged commit 55fd11d into main May 14, 2024
107 checks passed
@jotare jotare deleted the joanantoniriera4168/sc-10087/support-vectorsets-at-shard-level branch May 14, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants