Introducing hipification of CuPy for ROCm compatibility with CUDA #8316

AdrianAbeyta · 2024-05-01T18:17:47Z

Overview

This pull request introduces the hipify utility to facilitate the conversion of CUDA C/C++/Cython code into HIP C/C++/Cython code within CuPy. Hipify offers a straightforward search-and-replace mechanism that maps CUDA constructs to their HIP equivalents, allowing CuPy to support AMD's ROCm ecosystem seamlessly. This addition particularly targets the conversion of CUDA files to HIP equivalents for CUDA/Stub builds, as detailed in the modified setup.py.

Main Changes

Integration of hipify:
The primary modification is in setup.py (see relevant section):

Determines whether to trigger the hipification process based on the presence of ROCm.
Utilizes hipify to generate HIP-compatible files by recursively converting the CUDA codebase.
Implements a custom mapping configuration, ensuring a precise mapping from CUDA constructs to HIP constructs.

Custom mapping and hipify mechanism:

Introduces a specific mapping, converting CUPY_USE_GEN_HIP_CODE to CUPY_DONT_USE_GEN_HIP_CODE.
Facilitates a mechanism for selecting the appropriate libraries only if on ROCm platforms; defaults to legacy CUDA code otherwise.

cuRAND Example:

An example integration is provided for cuRAND, demonstrating the specific mapping and library-switching mechanism between HIP and legacy CUDA versions.(see mechanism here)

Impact

This update ensures that CuPy can smoothly support both CUDA and ROCm, widening its compatibility range. It reduces manual intervention in maintaining both ecosystems and streamlines the conversion process, particularly in CUDA-to-HIP mappings. Importantly, existing CUDA code remains unchanged, ensuring compatibility for NVIDIA GPUs.

mergify · 2024-05-01T18:18:26Z

This pull request is now in conflicts. Could you fix it @AdrianAbeyta? 🙏

takagi · 2024-05-08T01:18:17Z

install/cupy_builder/_command.py

+            compile_time_env['CUPY_USE_GEN_HIP_CODE'] = 1
+            compile_time_env['CUPY_DONT_USE_GEN_HIP_CODE'] = 0


Do we need both of them? Doesn't compile_time_env['CUPY_USE_GEN_HIP_CODE'] = 0 mean compile_time_env['CUPY_DONT_USE_GEN_HIP_CODE'] = 1?

Yes, we currently need both environment flags for ROCm: CUPY_USE_GEN_HIP_CODE and CUPY_DONT_USE_GEN_HIP_CODE. These flags serve different purposes at different stages of the process.

Compile Time:
During the compilation of HIP files, CUPY_USE_GEN_HIP_CODE indicates that HIP code generation is enabled. Hipify then completes a custom mapping at compile time from CUPY_USE_GEN_HIP_CODE to CUPY_DONT_USE_GEN_HIP_CODE see implementation here.

Runtime:
During runtime, we use the new custom mapping with CUPY_DONT_USE_GEN_HIP_CODE. This is done to avoid circular import errors caused by Cython constraints during runtime.

Thus, CUPY_USE_GEN_HIP_CODE=0 for compilation does not automatically imply CUPY_DONT_USE_GEN_HIP_CODE=1 for runtime. They are set independently to handle different phases (compilation vs. runtime) without causing conflicts or errors.

takagi · 2024-05-08T01:29:13Z

.github/workflows/pretest.yml

@@ -77,6 +77,7 @@ jobs:

    - name: Build
      run: |
+        pip install git+https://github.com/ROCmSoftwarePlatform/hipify_torch.git


Do we need hipify_torch even on CUDA build?

No, we do not need hipify_torch on CUDA build. I updated the pretest.yml to reflect these changes. Thanks!

takagi · 2024-05-08T01:29:48Z

.github/workflows/pretest.yml

@@ -87,6 +88,7 @@ jobs:
        python -Werror::DeprecationWarning -m compileall -f -q cupy cupyx examples tests docs
        pushd docs
        pip install -r requirements.txt
+        pip install -r rocm-requirements.txt


Also, do we need hipify_torch on doc build? I suppose doc building uses stub files and is independent of ROCm stuff.

We also do not need hipify_torch for the doc build since it uses stub files as you mentioned. Updated the pretest.yml to reflect this.

takagi · 2024-05-08T02:25:17Z

docker: failed to register layer: write /opt/rocm-6.1.0/lib/llvm/bin/flang-legacy: no space left on device.
https://github.com/cupy/cupy/actions/runs/8914357746/job/24481782055

It seems that this is because the rocm docker image is getting larger than before (rocm-5.0). I checked container sizes that pulled rocm images (see SIZE column):

$ docker system df -v
REPOSITORY                                                                         TAG                        IMAGE ID       CREATED         SIZE      SHARED SIZE   UNIQUE SIZE   CONTAINERS
rocm/dev-ubuntu-22.04                                                              6.1-complete               0708dce5557e   2 weeks ago     26.2GB    0B            26.23GB       1
rocm/dev-ubuntu-20.04                                                              5.0-complete               5d4953aabf2a   2 years ago     11.2GB    0B            11.24GB       1

Also, according to this document, we have only 14GB disk space on GitHub-hosted runners.
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

takagi · 2024-05-08T02:32:36Z

I'm not sure how we should run build-rocm job on GitHub-hosted runners, however, I'm going to temporarily replicate and update #7964 to test the changes on this PR work on the self-hosted runners.

takagi · 2024-05-08T02:36:38Z

Also I will update #7964 itself to make it ready to merge, which may require our members to confirm our policy to have self-hosted runners on a public repository.

kmaehashi · 2024-05-08T06:19:48Z

Thank you for the contribution and all the help improving support for ROCm in CuPy! Regarding this pull request, I'm a bit concerned about the potential instability this may introduce and the negative impact it could have on both user and developer experience:

This change would require users to install hipify-torch in order to build CuPy for ROCm, which adds an extra step and potential barrier to the installation process.
All contributors modifying CUDA code in the CuPy repository would need to be familiar with hipify-torch. Given that hipify-torch uses search-and-replace approach, I'm assuming it's highly likely to encounter failures in edge cases.
Hipified files are left in the source tree. They are recognized as untracked files by Git and could be noisy or accidentally committed. (as shown below)
Since hipify-torch is not versioned, the build results of CuPy could vary uncontrollably based on the latest commit in the hipify-torch repository (https://github.com/ROCm/hipify_torch).

CuPy has a ROCm support that is already mature, and in my current understanding, the introduction of hipify-torch does not appear to provide significant value compared to the drawbacks mentioned above.

% pip install -e . -v
Using pip 24.0 from /home/maehashi/.pyenv/versions/3.10.13/envs/local-3.10.13/lib/python3.10/site-packages/pip (python 3.10)
Obtaining file:///home/maehashi/Development/cupy
  Running command python setup.py egg_info
  Generating cache key from header files...
  Cache key (1700 files matching /home/maehashi/Development/cupy/cupy/_core/include/**): 65dbfe98a23933f1a73cec9264029f473436ff11
  INFO: hipification of cupy_backends in progress ...
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_nccl.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_nccl_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusparse.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusparse_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cublas.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cublas_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda_profiler_api.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip_profiler_api.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cutensor.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cutensor_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusolver.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusolver_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cudnn.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cudnn.h [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda_runtime.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip_runtime.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_extern.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_extern_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_typedef.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_typedef_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_softlink.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_softlink.pxi [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_typedef.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_typedef_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_extern.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_extern_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/profiler.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/profiler_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/_cnvrtc.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/_cnvrtc_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.pxd [ok]
  Successfully preprocessed all matching files.
...

AdrianAbeyta · 2024-05-15T21:10:49Z

docker: failed to register layer: write /opt/rocm-6.1.0/lib/llvm/bin/flang-legacy: no space left on device.
https://github.com/cupy/cupy/actions/runs/8914357746/job/24481782055

It seems that this is because the rocm docker image is getting larger than before (rocm-5.0). I checked container sizes that pulled rocm images (see SIZE column):
$ docker system df -v
REPOSITORY                                                                         TAG                        IMAGE ID       CREATED         SIZE      SHARED SIZE   UNIQUE SIZE   CONTAINERS
rocm/dev-ubuntu-22.04                                                              6.1-complete               0708dce5557e   2 weeks ago     26.2GB    0B            26.23GB       1
rocm/dev-ubuntu-20.04                                                              5.0-complete               5d4953aabf2a   2 years ago     11.2GB    0B            11.24GB       1
Also, according to this document, we have only 14GB disk space on GitHub-hosted runners. https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

Thanks for pointing this out. As ROCm versions have evolved, the base ROCm images have grown in size. I have added a cleanup of unnecessary items that take up some runner disk space to account for this. See modification here

AdrianAbeyta · 2024-05-15T21:20:20Z

Thank you for the contribution and all the help improving support for ROCm in CuPy! Regarding this pull request, I'm a bit concerned about the potential instability this may introduce and the negative impact it could have on both user and developer experience:

This change would require users to install hipify-torch in order to build CuPy for ROCm, which adds an extra step and potential barrier to the installation process.
All contributors modifying CUDA code in the CuPy repository would need to be familiar with hipify-torch. Given that hipify-torch uses search-and-replace approach, I'm assuming it's highly likely to encounter failures in edge cases.
Hipified files are left in the source tree. They are recognized as untracked files by Git and could be noisy or accidentally committed. (as shown below)
Since hipify-torch is not versioned, the build results of CuPy could vary uncontrollably based on the latest commit in the hipify-torch repository (https://github.com/ROCm/hipify_torch).

CuPy has a ROCm support that is already mature, and in my current understanding, the introduction of hipify-torch does not appear to provide significant value compared to the drawbacks mentioned above.

% pip install -e . -v
Using pip 24.0 from /home/maehashi/.pyenv/versions/3.10.13/envs/local-3.10.13/lib/python3.10/site-packages/pip (python 3.10)
Obtaining file:///home/maehashi/Development/cupy
  Running command python setup.py egg_info
  Generating cache key from header files...
  Cache key (1700 files matching /home/maehashi/Development/cupy/cupy/_core/include/**): 65dbfe98a23933f1a73cec9264029f473436ff11
  INFO: hipification of cupy_backends in progress ...
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_nccl.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_nccl_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusparse.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusparse_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cublas.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cublas_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda_profiler_api.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip_profiler_api.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cutensor.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cutensor_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/stream.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusolver.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cusolver_hip.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/_softlink.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cudnn.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cudnn.h [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_cuda_runtime.h -> /home/maehashi/Development/cupy/cupy_backends/cuda/cupy_hip_runtime.h [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_extern.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_extern_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_typedef.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_typedef_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_enum.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_softlink.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_softlink.pxi [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_enum.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/runtime_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_typedef.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_driver_typedef_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_extern.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/_runtime_extern_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/api/driver_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pyx [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/profiler.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/profiler_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusolver_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cutensor_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nccl_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/__init__.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/__init__.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvrtc_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cublas_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pxd [skipped, no changes]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse.cpp -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparse_hip.cpp [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/_cnvrtc.pxi -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/_cnvrtc_hip.pxi [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cusparselt_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx.pyx -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/nvtx_hip.pyx [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/cudnn_hip.pxd [ok]
  /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand.pxd -> /home/maehashi/Development/cupy/cupy_backends/cuda/libs/curand_hip.pxd [ok]
  Successfully preprocessed all matching files.
...

Thank you for your insights @kmaehashi,

While hipify-torch adds an extra installation step, it aligns us with industry practices and uses PyTorch's proven method for ROCm users. Our current ROCm support is mature, however it does not fully utilize newer ROCm capabilities. Adopting hipify-torch keeps CUDA and ROCm aligned, ensuring functional parity, and minimizes changes on the CUDA side. We hope it will make the development process streamlined, preventing it from falling behind CUDA.

Thanks for addressing the user/dev experience feedback, after some thought on this I was thinking we could make the user/dev experience better by adding in the following:

We want to have more open-source community engagement and welcome any suggestions to streamline the development process. For edge cases or any questions that arise, please involve the CuPy ROCm developers to help address any issues. Mentioned here @pnunna93, @lcskrishna, @bmedishe and @shbiswas834. I can also collaborate with the hipify-torch team to provide a more comprehensive README.
To address the issue of hipified files in the source tree, we can update our .gitignore configurations. This should help keep the repository organized, reducing noise, and preventing accidental commits of these files. I have proposed a solution here.
I discussed the stability concern with hipify-torch with the PyTorch team, and they mentioned they currently ensure stability by pinning to specific commits. We can adopt this approach to prevent build variations and maintain consistency an example can be found here. Additionally, I can collaborate with the hipify-torch team to provide a more comprehensive versioning support in the future that we can track.

I appreciate the feedback and am open to other suggestions on how to make the user/dev experience more positive with this change.

AdrianAbeyta · 2024-05-15T21:32:51Z

Also I will update #7964 itself to make it ready to merge, which may require our members to confirm our policy to have self-hosted runners on a public repository.

Thanks, @takagi. Please let me know if there's anything I can do to help. I understand that some teams using self-hosted runners on public repositories have a setup where a maintainer needs to approve a pull request before CI workflows can run. This is done using the "Approve and Run" button for added security see approving-workflow-runs-from-public-forks for more information.

takagi · 2024-05-20T01:41:57Z

Thanks, @AdrianAbeyta. Our team had a talk about "Approve and Run" button and we would not like to use it because, even on workflows using a GitHub-hosted Runner, e.g. for linting and document building, it requires manual approval on every commit. We tried it before and feel it is too cumbersome.

Instead, we think of another solution. We plan to have two repositories, A (public, used for development) and A-fork (private, used for testing, with self-hosted runner registered). A test comment to a pull request on A will trigger pushing the commit from A to A-fork, then starting the CI workflow on A-fork.

AdrianAbeyta added 8 commits April 1, 2024 23:44

Initial bringup for hipify torch with cupy

a1483fb

Fix conditionals

967711b

Fix nvrtc dependency issue with ROCm 6.0

cf45644

Fix failing hiprand test

434f225

Fix compiler settings when testing with HIP

5b37ccc

Skip failing unit tests

add5ff9

Skiped failing UT

237370c

Update runner to rocm6.1

494a0c3

AdrianAbeyta added 2 commits May 1, 2024 18:22

Merge remote-tracking branch 'upstream/main' into enable_rocm_hipify

40cd249

Fix flake8 issues

aa8eb67

AdrianAbeyta force-pushed the enable_rocm_hipify branch from 2e98be4 to aa8eb67 Compare May 1, 2024 20:06

takagi self-assigned this May 2, 2024

takagi added cat:enhancement Improvements to existing features prio:medium labels May 2, 2024

Add rocm-requirements and update setup.py

cc707e8

takagi reviewed May 8, 2024

View reviewed changes

kmaehashi self-assigned this May 8, 2024

AdrianAbeyta and others added 6 commits May 14, 2024 13:54

Update workflow based on comments

23f1abd

Clean up rocm-pre-test

b0a6c14

Update pre-test

48fbd27

Update pretest.yml

8ea06e0

Update pretest-rocm-test.sh

0f5a004

Update pretest-rocm-test.sh

03e950a

AdrianAbeyta and others added 3 commits May 14, 2024 16:09

Update pretest-rocm-test.sh

be12483

Update pretest.yml

bfbc46f

Add _hip generated files to .gitignore

b4b7a46

AdrianAbeyta requested a review from takagi May 17, 2024 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing hipification of CuPy for ROCm compatibility with CUDA #8316

Introducing hipification of CuPy for ROCm compatibility with CUDA #8316

AdrianAbeyta commented May 1, 2024

mergify bot commented May 1, 2024

takagi May 8, 2024

AdrianAbeyta May 17, 2024

takagi May 8, 2024

AdrianAbeyta May 15, 2024

takagi May 8, 2024 •

edited

AdrianAbeyta May 15, 2024

takagi commented May 8, 2024

takagi commented May 8, 2024

takagi commented May 8, 2024

kmaehashi commented May 8, 2024

AdrianAbeyta commented May 15, 2024

AdrianAbeyta commented May 15, 2024

AdrianAbeyta commented May 15, 2024

takagi commented May 20, 2024

		compile_time_env['CUPY_USE_GEN_HIP_CODE'] = 1
		compile_time_env['CUPY_DONT_USE_GEN_HIP_CODE'] = 0

Introducing hipification of CuPy for ROCm compatibility with CUDA #8316

Are you sure you want to change the base?

Introducing hipification of CuPy for ROCm compatibility with CUDA #8316

Conversation

AdrianAbeyta commented May 1, 2024

Overview

Main Changes

cuRAND Example:

Impact

mergify bot commented May 1, 2024

takagi May 8, 2024

Choose a reason for hiding this comment

AdrianAbeyta May 17, 2024

Choose a reason for hiding this comment

takagi May 8, 2024

Choose a reason for hiding this comment

AdrianAbeyta May 15, 2024

Choose a reason for hiding this comment

takagi May 8, 2024 • edited

Choose a reason for hiding this comment

AdrianAbeyta May 15, 2024

Choose a reason for hiding this comment

takagi commented May 8, 2024

takagi commented May 8, 2024

takagi commented May 8, 2024

kmaehashi commented May 8, 2024

AdrianAbeyta commented May 15, 2024

AdrianAbeyta commented May 15, 2024

AdrianAbeyta commented May 15, 2024

takagi commented May 20, 2024

takagi May 8, 2024 •

edited