Skip to content

Commit

Permalink
AVX512 for PQFastScan (#3276)
Browse files Browse the repository at this point in the history
Summary:
AVX-512 implementation for PQFastScan for QBS.
For local benchmarks on 4th gen Xeon, the QPS is up to 10% higher, mostly for a single query case. But as far as I remember, production cases would show higher performance improvements.

* Baseline `benchs/bench_ivf_fastscan_single_query.py` (sift1M): https://gist.github.com/alexanderguzhva/c9cde2cb5e9c7675f429623e6faa9fbf
* Candidate `benchs/bench_ivf_fastscan_single_query.py` (sift1M): https://gist.github.com/alexanderguzhva/4e8530073a108f73771d38e55bc45b17
* Baseline `benchs/bench_ivf_fastscan.py` (sift1M): https://gist.github.com/alexanderguzhva/9eb03ed60354d7e76cfa25e676f983ac
* Candidate `benchs/bench_ivf_fastscan.py` (sift1M): https://gist.github.com/alexanderguzhva/3cbfeba1364dd445a2bb52455966979e

mdouze should I modify `pq4_fast_scan_search_1.cpp` as well? It is somewhat cumbersome to dig through various possible sub-implementations

Pull Request resolved: #3276

Reviewed By: junjieqi

Differential Revision: D54943632

Pulled By: mdouze

fbshipit-source-id: 3d70066e9779039559b1734c2be99bf439058246
  • Loading branch information
alexanderguzhva authored and facebook-github-bot committed Mar 29, 2024
1 parent d685413 commit d99f07e
Show file tree
Hide file tree
Showing 5 changed files with 784 additions and 2 deletions.
34 changes: 34 additions & 0 deletions faiss/impl/LookupTableScaler.h
Expand Up @@ -38,6 +38,23 @@ struct DummyScaler {
return simd16uint16(0);
}

#ifdef __AVX512F__
inline simd64uint8 lookup(const simd64uint8&, const simd64uint8&) const {
FAISS_THROW_MSG("DummyScaler::lookup should not be called.");
return simd64uint8(0);
}

inline simd32uint16 scale_lo(const simd64uint8&) const {
FAISS_THROW_MSG("DummyScaler::scale_lo should not be called.");
return simd32uint16(0);
}

inline simd32uint16 scale_hi(const simd64uint8&) const {
FAISS_THROW_MSG("DummyScaler::scale_hi should not be called.");
return simd32uint16(0);
}
#endif

template <class dist_t>
inline dist_t scale_one(const dist_t&) const {
FAISS_THROW_MSG("DummyScaler::scale_one should not be called.");
Expand Down Expand Up @@ -67,6 +84,23 @@ struct NormTableScaler {
return (simd16uint16(res) >> 8) * scale_simd;
}

#ifdef __AVX512F__
inline simd64uint8 lookup(const simd64uint8& lut, const simd64uint8& c)
const {
return lut.lookup_4_lanes(c);
}

inline simd32uint16 scale_lo(const simd64uint8& res) const {
auto scale_simd_wide = simd32uint16(scale_simd, scale_simd);
return simd32uint16(res) * scale_simd_wide;
}

inline simd32uint16 scale_hi(const simd64uint8& res) const {
auto scale_simd_wide = simd32uint16(scale_simd, scale_simd);
return (simd32uint16(res) >> 8) * scale_simd_wide;
}
#endif

// for non-SIMD implem 2, 3, 4
template <class dist_t>
inline dist_t scale_one(const dist_t& x) const {
Expand Down

0 comments on commit d99f07e

Please sign in to comment.