Skip to content

Commit

Permalink
Unroll loop in lookup_2_lanes (#3364)
Browse files Browse the repository at this point in the history
Summary:
The current loop goes from 0 to 31.  It has an if statement to do an assignment for j < 16 and a different assignment for j >= 16.  By unrolling the loop to do the j < 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and the number of loop iterations is reduced in half.

Then unroll the loop for the j < 16 and the j >=16 to a depth of 2.

This change results in approximately a 55% reduction in the execution time for the bench_ivf_fastscan.py workload on Power 10 when compiled with CMAKE_INSTALL_CONFIG_NAME=Release.

The removal of the if (j < 16) statement and the unrolling of the loop removes branch cycle stall and register dependencies on instruction issue. The result is the unrolled code is able issue instructions earlier thus reducing the total number of cycles required to execute the function.

Pull Request resolved: #3364

Reviewed By: kuarora

Differential Revision: D56455690

Pulled By: mdouze

fbshipit-source-id: 490a17a40d9d4439b1a8ea22e991e706d68fb2fa
  • Loading branch information
Carl Love authored and facebook-github-bot committed Apr 24, 2024
1 parent 5893ab7 commit b2e91f6
Show file tree
Hide file tree
Showing 2 changed files with 1,088 additions and 0 deletions.
4 changes: 4 additions & 0 deletions faiss/utils/simdlib.h
Expand Up @@ -27,6 +27,10 @@

#include <faiss/utils/simdlib_neon.h>

#elif defined(__PPC64__)

#include <faiss/utils/simdlib_ppc64.h>

#else

// emulated = all operations are implemented as scalars
Expand Down

0 comments on commit b2e91f6

Please sign in to comment.