Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade IVFFlatScanner and exhaustive_L2sqr_seq #3074

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexanderguzhva
Copy link
Contributor

@alexanderguzhva alexanderguzhva commented Sep 25, 2023

Speeds up IVFFlatScanner by helping the branch predictor and computing distances for 4 elements per loop.
Same upgrade for exhaustive_inner_product_seq() and exhaustive_L2sqr_seq() calls.

@alexanderguzhva
Copy link
Contributor Author

@mdouze Matthijs, could you please check whether this test failure is critical. It seems that it is about epsilons on certain architectures. On the other hand, it is seems that it is the code for dedup-ing.

Copy link
Contributor

@mdouze mdouze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how right away but it would be good if the code could be less repetitive. The loop over iCounter is very similar in 3 locations. This makes maintenance harder.

// support for a particular kind of IDSelector classes. This
// may be useful if the lion's share of samples are filtered out.

struct EmptySelectorHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call it IDSelectorAll
Indeed when you delete elements it makes the index empty but IDSelector is also used for searching a subset of ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Let me think how to get rid of this redundancy while keeping the code clear

@alexanderguzhva
Copy link
Contributor Author

@mdouze , I've upgraded the code, please take a look, whether it is simple enough or whether it needs to simplified further

@mdouze
Copy link
Contributor

mdouze commented Oct 2, 2023

OK the current version is complex but less repetitive. I'll import to fbcode.

Weird that the errors still appear. I thought I relaxed these tests. Did you rebase on latest Faiss?

@facebook-github-bot
Copy link
Contributor

@mdouze has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mdouze
Copy link
Contributor

mdouze commented Oct 3, 2023

Do you have numbers (even very rough) that show where this modification speeds up search?

@alexanderguzhva
Copy link
Contributor Author

@mdouze , my mistake, let me rebase and run the test

@alexanderguzhva
Copy link
Contributor Author

alexanderguzhva commented Oct 3, 2023

I've used benchs/bench_ivf_selector.cpp, fixed its errors (missing #include <faiss/impl/FaissAssert.h>) and made sure that cmake version compiles the code for AVX2, and used IVF1024,Flat index (instead of a default IVF1024,SQ8).

Two cases: d=64 and d=768. I'm not sure whether the results are super reliable for d=64 (I'm running one on a VM), but it seems that it became a bit slower, while for d=768 the speedup is visible and reproducible.

I think that I either need to add some threshold for d (say, starting from d=512, otherwise the new version is plain slower), or make it optional, or check the code further. So, please don't land this diff yet.

d=64, Baseline:

index_key=IVF1024,Flat
Read /tmp/bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 89.960 ms
search time with nullptr selector: 247.900 ms
search time with selector: 244.995 ms
search time with null selector + manual parallel: 246.943 ms
parallel_mode=3
search time, no selector: 85.567 ms
search time with nullptr selector: 85.795 ms
search time with selector: 85.635 ms
search time with null selector + manual parallel: 87.590 ms
set single thread
parallel_mode=0
search time, no selector: 259.159 ms
search time with nullptr selector: 226.071 ms
search time with selector: 221.616 ms
search time with null selector + manual parallel: 221.348 ms

d=64, Candidate:

index_key=IVF1024,Flat
Read /tmp/bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 93.027 ms
search time with nullptr selector: 252.398 ms
search time with selector: 247.649 ms
search time with null selector + manual parallel: 248.685 ms
parallel_mode=3
search time, no selector: 88.478 ms
search time with nullptr selector: 87.638 ms
search time with selector: 89.603 ms
search time with null selector + manual parallel: 91.405 ms
set single thread
parallel_mode=0
search time, no selector: 254.079 ms
search time with nullptr selector: 218.183 ms
search time with selector: 229.076 ms
search time with null selector + manual parallel: 217.755 ms

d=768, Baseline:

index_key=IVF1024,Flat
Read /tmp/0bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 938.073 ms
search time with nullptr selector: 2196.695 ms
search time with selector: 2214.705 ms
search time with null selector + manual parallel: 2249.250 ms
parallel_mode=3
search time, no selector: 928.345 ms
search time with nullptr selector: 930.376 ms
search time with selector: 937.659 ms
search time with null selector + manual parallel: 934.082 ms
set single thread
parallel_mode=0
search time, no selector: 2281.896 ms
search time with nullptr selector: 2211.583 ms
search time with selector: 2216.971 ms
search time with null selector + manual parallel: 2221.527 ms

d=768, Candidate:

index_key=IVF1024,Flat
Read /tmp/0bench_ivf_selector_IVF1024,Flat.faissindex
parallel_mode=0
search time, no selector: 893.283 ms
search time with nullptr selector: 1847.295 ms
search time with selector: 1873.944 ms
search time with null selector + manual parallel: 1932.494 ms
parallel_mode=3
search time, no selector: 902.230 ms
search time with nullptr selector: 889.027 ms
search time with selector: 891.625 ms
search time with null selector + manual parallel: 901.839 ms
set single thread
parallel_mode=0
search time, no selector: 1885.094 ms
search time with nullptr selector: 1861.240 ms
search time with selector: 1846.442 ms
search time with null selector + manual parallel: 1849.544 ms

@alexanderguzhva
Copy link
Contributor Author

@mdouze , additionally, I will modify fvec_L2sqr_by_idx() and fvec_inner_products_by_idx() in the same way, because it helps IndexRefineFlat

@alexanderguzhva
Copy link
Contributor Author

I'm still playing with it

@alexanderguzhva alexanderguzhva force-pushed the upgrade_IVFFlatScanner branch 2 times, most recently from 5884d8c to d046e65 Compare October 9, 2023 18:27
Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
@mdouze
Copy link
Contributor

mdouze commented Nov 28, 2023

What is the current status of this PR?

@alexanderguzhva
Copy link
Contributor Author

@mdouze Technically, it works and I can rebase it on the master branch. But this diff improves the performance for large dim datasets, but may degrade the performance for smaller dim datasets. I've enabled it in Zilliz version of faiss, because it mostly operates with dim like 512+. But I'm not sure for the general purpose case.

Is the benchmarking set that Gergely works on ready to be tried?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants