Enable Intel®-AMX/oneDNN to accelerate IndexFlatIP search #3266

guangzegu · 2024-02-27T15:01:38Z

Description

Intel® AMX, which is an AI acceleration engine deeply embedded into every core of our 4th/5th Gen Intel® Xeon® Scalable processor. Intel® AMX(Intel Advanced Matrix Extensions) is a set of programming extensions designed to enhance the performance of matrix operations. Intel oneAPI Deep Neural Network Library (oneDNN) is an open-source performance library designed to accelerate deep learning frameworks on Intel architectures. oneDNN is able to leverage the efficient matrix computation extensions provided by AMX to accelerate the performance of deep learning frameworks on Intel architectures, especially for computation-intensive matrix operations.

IndexFlatIP search performance accelerated by oneDNN/AMX improves by 1.7X to 5X compared to the default inner_product, In scenarios with 1 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.

IndexFlatIP search performance accelerated by oneDNN/AMX improves by up to 4X compared to the Blas inner_product, In scenarios with 1000 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.

How to use

When invoking Cmake , add an option as follows:

-DFAISS_ENABLE_DNNL=OFF Enable support for oneDNN to accelerate IndexFlatIP search(possible values are ON and OFF)

When you want to use Intel®-AMX/oneDNN to accelerate the search of indexFlatIP, set FAISS_ENABLE_DNNL to ON and run on 4th/5th Gen Intel® Xeon® Scalable processor, the exhaustive_inner_product_seq method will be accelerated.

Co-authored-by: @xtangxtang xi.tang@intel.com

facebook-github-bot · 2024-02-27T15:01:44Z

Hi @guangzegu!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

alexanderguzhva · 2024-02-29T00:56:11Z

@guangzegu this patch is in extremely early stage.

there needs to be a description in the readme.txt file about how to set up oneAPI properly. For example, I needed to install dnnl, mkl and tbb, and then run source setvars.sh from oneAPI root directory. Imagine that someone sets this up on a fresh machine or in a docker container.
it needs to be mentioned on how to set up DNNL_LIB in cmake arguments.
a unit test tests to be added that activates the execution path that you've added. Basically, exhaustive search for IP has many if-then-else internal conditions and execution branches for various use cases (topk=1, topk=many, many query samples, few query samples, etc). The effect of your patch needs to be measured in milliseconds.
I tried invoke the needed path, and whenever I invoke your code on AWS M7i machine (Intel Xeon 4th gen), I see the exception with the test could not create a primitive descriptor for an inner product forward propagation primitive. It is completely unclear about what goes wrong. amx_bf16 capability is enabled, which is seen in cat /proc/cpuinfo

Thanks

@mdouze Is Intel Xeon 4th gen available for CI?

guangzegu · 2024-03-05T14:16:13Z

@guangzegu this patch is in extremely early stage.

there needs to be a description in the readme.txt file about how to set up oneAPI properly. For example, I needed to install dnnl, mkl and tbb, and then run source setvars.sh from oneAPI root directory. Imagine that someone sets this up on a fresh machine or in a docker container.

it needs to be mentioned on how to set up DNNL_LIB in cmake arguments.

a unit test tests to be added that activates the execution path that you've added. Basically, exhaustive search for IP has many if-then-else internal conditions and execution branches for various use cases (topk=1, topk=many, many query samples, few query samples, etc). The effect of your patch needs to be measured in milliseconds.

I tried invoke the needed path, and whenever I invoke your code on AWS M7i machine (Intel Xeon 4th gen), I see the exception with the test could not create a primitive descriptor for an inner product forward propagation primitive. It is completely unclear about what goes wrong. amx_bf16 capability is enabled, which is seen in cat /proc/cpuinfo

Thanks

@mdouze Is Intel Xeon 4th gen available for CI?

@alexanderguzhva Thank you very much for your comments.

I will add a description in the readme.txt file on configuring oneDNN to enable this feature, indeed, the addition of unit tests needs to be carefully considered.
I didn't run into this error could not create a primitive descriptor for an inner product forward propagation primitive in my environment before, I didn't set the environment variables using oneAPI, but simply installed oneDNN additionally under the community version. You can try referring to this link: https://oneapi-src.github.io/oneDNN/dev_guide_build.html. The version is v3.3+.

…ndexFlatIP

guangzegu · 2024-03-26T14:25:12Z

@alexanderguzhva

I suppose the unit tests for this PR can be covered by faiss/tests/test_index.py.
The new commits have added some installation descriptions and have also enhanced the performance.
You might try again with the latest changes. If there are any issues, please feel free to contact me .

alexanderguzhva · 2024-04-06T00:57:41Z

@guangzegu Thanks, I'll take a look

Enable Intel®-AMX/oneDNN to accelerate IndexFlat search

7db8ce6

formatted distances.cpp and onednn_utils.h

b35a0f2

guangzegu force-pushed the main branch from 2de4b10 to b35a0f2 Compare March 11, 2024 02:36

facebook-github-bot added the CLA Signed label Mar 13, 2024

Add descriptions of Intel®-AMX/oneDNN optimization to INSTALL.md

781f178

guangzegu marked this pull request as ready for review March 19, 2024 07:38

Add oneDNN/AMX optimization for distance calculation using Blas for I…

2f3fdf9

…ndexFlatIP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Intel®-AMX/oneDNN to accelerate IndexFlatIP search #3266

Enable Intel®-AMX/oneDNN to accelerate IndexFlatIP search #3266

guangzegu commented Feb 27, 2024 •

edited

facebook-github-bot commented Feb 27, 2024

alexanderguzhva commented Feb 29, 2024 •

edited

guangzegu commented Mar 5, 2024

guangzegu commented Mar 26, 2024

alexanderguzhva commented Apr 6, 2024

Enable Intel®-AMX/oneDNN to accelerate IndexFlatIP search #3266

Are you sure you want to change the base?

Enable Intel®-AMX/oneDNN to accelerate IndexFlatIP search #3266

Conversation

guangzegu commented Feb 27, 2024 • edited

Description

How to use

facebook-github-bot commented Feb 27, 2024

Action Required

Process

alexanderguzhva commented Feb 29, 2024 • edited

guangzegu commented Mar 5, 2024

guangzegu commented Mar 26, 2024

alexanderguzhva commented Apr 6, 2024

guangzegu commented Feb 27, 2024 •

edited

alexanderguzhva commented Feb 29, 2024 •

edited