Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] initial port #3126

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open

Conversation

jeffdaily
Copy link

Prior to building the ROCm port, run the hipify script to create the gpu-rocm directory.

./faiss/gpu/hipify.sh
cmake -B build . -DFAISS_ENABLE_GPU=ON -DBUILD_TESTING=ON -DFAISS_ENABLE_C_API=OFF -DFAISS_ENABLE_PYTHON=OFF

All unit tests are passing for AMD GPUs featuring warp size 32 (gfx1100) and warp size 64 (gfx90a).

Though FAISS was written for warpSize == 32, it was possible to adapt the existing code for warpSize 64 using if constexpr in many places to separate the 32 vs 64 logic at compile-time. In some cases, this results in empty kernel implementations but this was necessary to avoid missing symbols at link time -- the code that launches these empty kernels performs a runtime check for the warpSize such that the empty kernels will never be launched. The kWarpSize constexpr should only be used in device code. For host code, it is necessary to query the current device's warpSize at runtime when calculating grid and block sizes.

jeffdaily and others added 30 commits February 7, 2023 22:42
- set(CMAKE_CXX_STANDARD 17)
- add_definitions(-DUSE_ROCM)
- hipify.sh runs hipify-perl per file in parallel
- hipify.sh only replaces the hipified file if it changes
  (avoids unnecessary rebuilds)
- USE_ROCM section of faiss/gpu/utils/DeviceDefs.cuh
- USE_ROCM section of faiss/gpu/utils/PtxUtils.cuh
Also put runtime errors in setBitfield for safeguard.
This should be handled by hipify-perl.
@mdouze
Copy link
Contributor

mdouze commented Nov 6, 2023

It is excellent news, that GPU Faiss is ported to AMD devices !!

We need to validate the diff:

  • no regressions on NV GPUs
  • test coverage in C++
  • test coverage in Python
  • ideally we would like to run it in CircleCI.

In the meantime, could you give a high-level overview of what's supported from Faiss on AMD GPUs: which index types are supported, any GPU vs. CPU benchmarks (even rough), install requirements...

Thanks!

@mdouze mdouze added the GPU label Nov 6, 2023
@alexanderguzhva
Copy link
Contributor

@jeffdaily Did you have a chance to run any benchmarks or the goal was to ensure that Faiss just works? Thanks.

@jeffdaily
Copy link
Author

@jeffdaily Did you have a chance to run any benchmarks or the goal was to ensure that Faiss just works? Thanks.

This initial porting was functional only. All unit tests pass for CUDA/V100 (as of ee8aea9) and for AMD Radeon PRO W7900 (warp size 32) and AMD Instinct MI250X/MI250 (warp size 64).

Do you have instructions for some benchmarks I could run?

@jeffdaily
Copy link
Author

The recent clang-format changes to fix the ci/circleci:Format job causes a lot of whitespace changes that I was trying to avoid.

@algoriddle
Copy link
Contributor

@jeffdaily Thank you for this PR, it's great to see Faiss running on AMD devices!

Can you clarify your intentions/objective re Faiss w/ ROCm?

There are various degrees of results you could aim for:

  1. A proof-of-concept that it's technically possible for Faiss to support ROCm - this PR
  2. A supported "from source" version of Faiss w/ ROCm that compiles with the generic instructions + all tests pass + no other platform is affected; AMD intends to support this code going forward
  3. As above + there's a contbuild that verifies that all tests pass - w/ clarity on who maintains this contbuild
  4. As above + a conda package is built for Faiss w/ ROCm - w/ clarity on who maintains the package specification and the build environment

Depending on your objectives and the level of investment that you intend to make, it would be worth discussing the best way to collaborate with AMD. (As an example, we have been working closely on Nvidia on their integration of RAFT with reasonable clarity on the points above.)

@jeffdaily
Copy link
Author

jeffdaily commented Nov 21, 2023

@algoriddle I think the claim is at minimum (1), but closer to (2). This PR does not add support for the python bindings and it does not add any CI jobs. It does provide ROCm-specific build instructions, all tests were personally verified to pass on AMD Radeon™ Pro W7900, AMD Instinct™ MI250X, and NVIDIA V100 --- hopefully that covers the "no other platform is affected". We could add a CI job to this PR that performs a ROCm build of faiss, if desired, but for any testing we would need to identify some publicly-available resource with AMD GPUs. AMD is supporting customers that wanted this faiss support, so I would expect additional support going forward.

@wickedfoo
Copy link
Contributor

I will review this later today or shortly after the holiday weekend.

We are super excited to have AMD GPUs supported for Faiss, but I think our main concern is that this ends up becoming an orphaned feature which clutters up the code and is liable to code rot over time without both AMD (and Meta) pushing this all the way through to @algoriddle 's (4) on his list. I don't think this is worth accepting unless we have a plan to push this through to first class support personally, though Matthijs or Gergely might be ok with (2) or (3).

To what degree would AMD be willing to help us get to (4) in the list above, beyond this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants