Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sve targets #2886

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

vorj
Copy link
Contributor

@vorj vorj commented May 31, 2023

related: #2884

This PR contains below changes:

  • Add new optlevel sve
    • ARM SVE is extension of ARMv8, so it should be treated similar to AVX2 IMO
  • Add targets for ARM SVE, faiss_sve and swigfaiss_sve
    • These targets will be built when you give -DFAISS_OPT_LEVEL=sve at build time
    • Design decision: Don't fix SVE register length.
      • The python package of faiss is "fat binary" (for example, the package for avx2 contains _swigfaiss_avx2.so and _swigfaiss.so)
      • SVE is scalable instruction set (= doesn't fix vector length), but actually we can specify the vector length at compile time.
        • with -msve-vector-length= option
        • When this option is specified, the binary can't work correctly on the CPU which has other vector length rather than specified at compile time
      • When we use fixed vector length, SVE-supported faiss python package will contain 7 shared libraries like _swigfaiss.so , _swigfaiss_sve.so , _swigfaiss_sve128.so , _swigfaiss_sve256.so , _swigfaiss_sve512.so , _swigfaiss_sve1024.so , and _swigfaiss_sve2048.so . The package size will be exploded.
      • For these reason, I don't specify the vector length at compile time and faiss_sve detects the vector length at run time.
  • Add a mechanism of detecting ARM SVE on runtime environment and importing swigfaiss_sve dynamically
    • Currently it only supports Linux, but there is no SVE environment with non-Linux OS now, as far as I know

NOTE: I plan to make one more PR about add some SVE implementation after this PR merged. This PR only contains adding sve target.

@mdouze
Copy link
Contributor

mdouze commented May 31, 2023

Please don't add a faiss/python/swigfaiss_sve.swig file.

@vorj
Copy link
Contributor Author

vorj commented May 31, 2023

Oh, sorry. I missed but that has been copied at this line. I removed the file and added the path on .gitignore .

@vorj vorj force-pushed the support-arm_sve branch 3 times, most recently from 091b0f7 to 5a3d8ea Compare June 2, 2023 23:46
@vorj vorj force-pushed the support-arm_sve branch 3 times, most recently from 96f35db to d7c27ba Compare June 13, 2023 04:11
@vorj
Copy link
Contributor Author

vorj commented Jun 20, 2023

environment: line 9: /opt/conda/lib/jvm/languages/python/bin/conda: No such file or directory

🤨

@vorj
Copy link
Contributor Author

vorj commented Jun 20, 2023

Ah, #2917, OK.

@vorj
Copy link
Contributor Author

vorj commented Jun 20, 2023

@mdouze How about the current status of this PR?

@mdouze
Copy link
Contributor

mdouze commented Jun 21, 2023

So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right?
Do you have hardware to try it on and maybe measure performance improvements?

@vorj
Copy link
Contributor Author

vorj commented Jun 21, 2023

So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right?
Do you have hardware to try it on and maybe measure performance improvements?

In this PR faiss uses SVE only with auto vectorized functions like fvec_L2sqr.
This PR still has little performance improvements, but I aim this as to add faiss_sve target at first.

@vorj
Copy link
Contributor Author

vorj commented Jun 22, 2023

As I wrote before,

I plan to make one more PR about add some SVE implementation after this PR merged.

It will include SVE implmemtations of code_distance , exhaustive_L2sqr_blas_cmax , and so on.

@vorj
Copy link
Contributor Author

vorj commented Jun 28, 2023

@mdouze IMO the PRs should be separated, but I'm willing to include the commits of performance improvement in this PR if you want it. How would you like it?

@mdouze
Copy link
Contributor

mdouze commented Jun 29, 2023

Sorry for being a bit slow to react.
I think that it's fine to land this packaging PR first, let us check the implications in terms of library size.

@vorj
Copy link
Contributor Author

vorj commented Jun 30, 2023

@mdouze OK. When you will want my action like:

  • need me to make a decision,
  • need to change some codes, or
  • want to know my opinion,

please feel free to send me some comments. Anyway, I will wait the checking for a while. Thanks.

@naveentatikonda
Copy link
Contributor

naveentatikonda commented Sep 21, 2023

@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.

Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!

@vorj
Copy link
Contributor Author

vorj commented Sep 22, 2023

@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.

@naveentatikonda
Copy link
Contributor

@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.

Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!

@mdouze Did you get a chance to look into my question?

@mdouze
Copy link
Contributor

mdouze commented Sep 26, 2023

OK so I think a way to move forward is to accept this PR but not cover it with CI.
Then optimized code for SVE can be contributed. At some point we will probably either:

  • add SVE to the CI or
  • remove SVE support if it turns out it is not used too much.

Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?

Thanks

@mdouze
Copy link
Contributor

mdouze commented Sep 26, 2023

Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version?
Thanks

@alexanderguzhva
Copy link
Contributor

I can assist and review the code, if needed

@vorj
Copy link
Contributor Author

vorj commented Sep 27, 2023

@mdouze

Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?

At least, current and future CPUs implemented ARMv9 will support SVE, because SVE2 is in the basic instruction set of ARMv9. Cortex-A510, Cortex-X2, Neoverse N2, Neoverse V2 are supporting ARMv9. However, I don't know that concrete implementations (real CPUs) will has ARMv9 or SVE, as this is decided by manufacturers.

@naveentatikonda
Copy link
Contributor

@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.

@vorj Do you also have plans to add sve support to ScalarQuantization after this PR is merged?

@vorj
Copy link
Contributor Author

vorj commented Sep 28, 2023

@naveentatikonda

Do you also have plans to add sve support to ScalarQuantization after this PR is merged?

Currently I don't have the SVE version of ScalarQuantization, so you will be able to contribute it. However, I will speed it up that the unoptimized codes I will find on some times to spare. If I will find no SVE ScalarQuantization codes at my faiss-optimizing time, I will do that.

@vorj vorj force-pushed the support-arm_sve branch 2 times, most recently from 4832e9a to 1ab6f01 Compare October 3, 2023 12:05
@vorj
Copy link
Contributor Author

vorj commented Oct 6, 2023

@mdouze

Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version?

I did it. Would you review this?

@cjnolet
Copy link
Contributor

cjnolet commented Oct 10, 2023

Just want to add a note here that this change is also very important to Nvidia RAPIDS libraries, as we're gearing up to have more libraries optimized for the Grace architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants