[ROCm] initial port #3126

jeffdaily · 2023-11-03T23:04:40Z

Prior to building the ROCm port, run the hipify script to create the gpu-rocm directory.

./faiss/gpu/hipify.sh
cmake -B build . -DFAISS_ENABLE_GPU=ON -DBUILD_TESTING=ON -DFAISS_ENABLE_C_API=OFF -DFAISS_ENABLE_PYTHON=OFF

All unit tests are passing for AMD GPUs featuring warp size 32 (gfx1100) and warp size 64 (gfx90a).

Though FAISS was written for warpSize == 32, it was possible to adapt the existing code for warpSize 64 using if constexpr in many places to separate the 32 vs 64 logic at compile-time. In some cases, this results in empty kernel implementations but this was necessary to avoid missing symbols at link time -- the code that launches these empty kernels performs a runtime check for the warpSize such that the empty kernels will never be launched. The kWarpSize constexpr should only be used in device code. For host code, it is necessary to query the current device's warpSize at runtime when calculating grid and block sizes.

- set(CMAKE_CXX_STANDARD 17) - add_definitions(-DUSE_ROCM) - hipify.sh runs hipify-perl per file in parallel - hipify.sh only replaces the hipified file if it changes (avoids unnecessary rebuilds) - USE_ROCM section of faiss/gpu/utils/DeviceDefs.cuh - USE_ROCM section of faiss/gpu/utils/PtxUtils.cuh

Also put runtime errors in setBitfield for safeguard.

…alizations.

This should be handled by hipify-perl.

Currently broken build.

must instead query for warpSize of the current device

cannot use kWarpSize in host code

mdouze · 2023-11-06T11:06:33Z

It is excellent news, that GPU Faiss is ported to AMD devices !!

We need to validate the diff:

no regressions on NV GPUs
test coverage in C++
test coverage in Python
ideally we would like to run it in CircleCI.

In the meantime, could you give a high-level overview of what's supported from Faiss on AMD GPUs: which index types are supported, any GPU vs. CPU benchmarks (even rough), install requirements...

Thanks!

alexanderguzhva · 2023-11-06T15:13:47Z

@jeffdaily Did you have a chance to run any benchmarks or the goal was to ensure that Faiss just works? Thanks.

jeffdaily · 2023-11-06T17:02:30Z

@jeffdaily Did you have a chance to run any benchmarks or the goal was to ensure that Faiss just works? Thanks.

This initial porting was functional only. All unit tests pass for CUDA/V100 (as of ee8aea9) and for AMD Radeon PRO W7900 (warp size 32) and AMD Instinct MI250X/MI250 (warp size 64).

Do you have instructions for some benchmarks I could run?

jeffdaily · 2023-11-06T17:06:46Z

The recent clang-format changes to fix the ci/circleci:Format job causes a lot of whitespace changes that I was trying to avoid.

algoriddle · 2023-11-21T16:51:45Z

@jeffdaily Thank you for this PR, it's great to see Faiss running on AMD devices!

Can you clarify your intentions/objective re Faiss w/ ROCm?

There are various degrees of results you could aim for:

A proof-of-concept that it's technically possible for Faiss to support ROCm - this PR
A supported "from source" version of Faiss w/ ROCm that compiles with the generic instructions + all tests pass + no other platform is affected; AMD intends to support this code going forward
As above + there's a contbuild that verifies that all tests pass - w/ clarity on who maintains this contbuild
As above + a conda package is built for Faiss w/ ROCm - w/ clarity on who maintains the package specification and the build environment

Depending on your objectives and the level of investment that you intend to make, it would be worth discussing the best way to collaborate with AMD. (As an example, we have been working closely on Nvidia on their integration of RAFT with reasonable clarity on the points above.)

jeffdaily · 2023-11-21T18:21:38Z

@algoriddle I think the claim is at minimum (1), but closer to (2). This PR does not add support for the python bindings and it does not add any CI jobs. It does provide ROCm-specific build instructions, all tests were personally verified to pass on AMD Radeon™ Pro W7900, AMD Instinct™ MI250X, and NVIDIA V100 --- hopefully that covers the "no other platform is affected". We could add a CI job to this PR that performs a ROCm build of faiss, if desired, but for any testing we would need to identify some publicly-available resource with AMD GPUs. AMD is supporting customers that wanted this faiss support, so I would expect additional support going forward.

wickedfoo · 2023-11-22T20:17:30Z

I will review this later today or shortly after the holiday weekend.

We are super excited to have AMD GPUs supported for Faiss, but I think our main concern is that this ends up becoming an orphaned feature which clutters up the code and is liable to code rot over time without both AMD (and Meta) pushing this all the way through to @algoriddle 's (4) on his list. I don't think this is worth accepting unless we have a plan to push this through to first class support personally, though Matthijs or Gergely might be ok with (2) or (3).

To what degree would AMD be willing to help us get to (4) in the list above, beyond this PR?

jeffdaily and others added 30 commits February 7, 2023 22:42

stub rocm into all CMakeLists.txt

36e1c57

add faiss/gpu/hipify.sh

d44ff2f

USE_ROCM section of faiss/gpu/utils/WarpShuffles.cuh

1056988

USE_ROCM section of faiss/gpu/utils/MergeNetworkWarp.cuh

7c7cb03

more updates

f19b7bc

allow kWarpSize 64 in BinaryDistance.cu

b8f48a7

fix error: use of undeclared identifier 'half'

3dcf89d

fix error: use of undeclared identifier 'cublasSgemmEx'

ed8fa1a

warp 64 fixes for IVFInterleaved

6473115

stub out asm as TODO

97fd7e4

warp 64 fixes for IVFUtilsSelect1 IVFUtilsSelect2

8151f75

fix error: use of undeclared identifier 'half'

2431d8f

warp 64 fixes for L2Select.cu

c072ed6

missing half, math_constants.h in VectorResidual

cc5c384

only compile warp 32 functions if warp size is 32

fdc80e8

cmake updates

5210714

build warp==32 dummy symbols to fix linking errors

9764491

gpu-rocm and python binding

8ecc1f6

fix error: #include <faiss/gpu/*>

f2d7665

Rewrite some asm code for ROCm in LoadStoreOperators

115a0c5

Resolved TODO's in PQCodeLoad.cuh

c6bec4a

Fix some bugs in LoadStoreOperators.cuh

96da5fe

Port the code to Navi 2x/3x, whose warp size is 32.

a6700a9

Properly implement getBitfield and GET_BITFIELD_U32/64 on ROCM.

729e929

Also put runtime errors in setBitfield for safeguard.

Fix LoadCode32<56> in ROCM, and put runtime safeguards in other speci…

46f2b0a

…alizations.

Fix the misuse of hip header in gpu/

ad74736

This should be handled by hipify-perl.

ROCM/Navi 2x: Fix LoadStore32 template and Float16 support in LoadStore.

53bdd9e

Merge branch 'main_upstream' into xinyazhang/navi-21

4d5be04

Currently broken build.

fix build

e1b7aa9

jeffdaily added 14 commits September 13, 2023 19:27

partial revert of using kWarpSize*2 etc

1778639

relax warp size 32 constraint

ebc1701

add and use getWarpSizeCurrentDevice()

72ab993

compiles for warpSize 64, however failing tests

7e3b7ed

TestGpuSelect passes for warpSize 64

021722e

modify __CUDA_ARCH__ or CUDA_VERSION checks with USE_ROCM

5bd71d9

fix GeneralDistance for both 32 and 64 warp sizes

2d38b95

fix launch bounds for PQCodeDistances-inl.cuh

cff550e

fix hammin20 read past end of array

bf472e3

interleaved based on kWarpSize, not hard-coded to 32

da5a818

cannot use kWarpSize in host code

ededed6

must instead query for warpSize of the current device

cannot use kLanes aka kWarpSize in host code

f3c963e

additional uses of kWarpSize found in host code

9551699

cannot use kWarpSize in host code

kWarpSize==64 fixes for WarpPackedBits 6bit and 4bit

9cd52cf

facebook-github-bot added CLA Signed module: rocm labels Nov 3, 2023

mdouze added the GPU label Nov 6, 2023

jeffdaily added 2 commits November 6, 2023 16:51

fix cuda build

ee8aea9

clang-format

8672522

jeffdaily added 3 commits November 7, 2023 00:16

fix raft build due to TILE_SIZE redefined

0b1751e

Merge branch 'main' into jeffdaily/rocm2

14cdf80

fix include statement so it hipifies properly

97523ff

mdouze mentioned this pull request Feb 1, 2024

[RFC] HIP backend for AMD GPU support #3231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] initial port #3126

[ROCm] initial port #3126

jeffdaily commented Nov 3, 2023

mdouze commented Nov 6, 2023

alexanderguzhva commented Nov 6, 2023

jeffdaily commented Nov 6, 2023

jeffdaily commented Nov 6, 2023

algoriddle commented Nov 21, 2023

jeffdaily commented Nov 21, 2023 •

edited

wickedfoo commented Nov 22, 2023

[ROCm] initial port #3126

Are you sure you want to change the base?

[ROCm] initial port #3126

Conversation

jeffdaily commented Nov 3, 2023

mdouze commented Nov 6, 2023

alexanderguzhva commented Nov 6, 2023

jeffdaily commented Nov 6, 2023

jeffdaily commented Nov 6, 2023

algoriddle commented Nov 21, 2023

jeffdaily commented Nov 21, 2023 • edited

wickedfoo commented Nov 22, 2023

jeffdaily commented Nov 21, 2023 •

edited