New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing hipification of CuPy for ROCm compatibility with CUDA #8316
base: main
Are you sure you want to change the base?
Conversation
This pull request is now in conflicts. Could you fix it @AdrianAbeyta? 🙏 |
2e98be4
to
aa8eb67
Compare
compile_time_env['CUPY_USE_GEN_HIP_CODE'] = 1 | ||
compile_time_env['CUPY_DONT_USE_GEN_HIP_CODE'] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both of them? Doesn't compile_time_env['CUPY_USE_GEN_HIP_CODE'] = 0
mean compile_time_env['CUPY_DONT_USE_GEN_HIP_CODE'] = 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we currently need both environment flags for ROCm: CUPY_USE_GEN_HIP_CODE
and CUPY_DONT_USE_GEN_HIP_CODE
. These flags serve different purposes at different stages of the process.
Compile Time:
During the compilation of HIP files, CUPY_USE_GEN_HIP_CODE
indicates that HIP code generation is enabled. Hipify then completes a custom mapping at compile time from CUPY_USE_GEN_HIP_CODE
to CUPY_DONT_USE_GEN_HIP_CODE
see implementation here.
Runtime:
During runtime, we use the new custom mapping with CUPY_DONT_USE_GEN_HIP_CODE
. This is done to avoid circular import errors caused by Cython constraints during runtime.
Thus, CUPY_USE_GEN_HIP_CODE=0
for compilation does not automatically imply CUPY_DONT_USE_GEN_HIP_CODE=1
for runtime. They are set independently to handle different phases (compilation vs. runtime) without causing conflicts or errors.
.github/workflows/pretest.yml
Outdated
@@ -77,6 +77,7 @@ jobs: | |||
|
|||
- name: Build | |||
run: | | |||
pip install git+https://github.com/ROCmSoftwarePlatform/hipify_torch.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need hipify_torch even on CUDA build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not need hipify_torch on CUDA build. I updated the pretest.yml to reflect these changes. Thanks!
.github/workflows/pretest.yml
Outdated
@@ -87,6 +88,7 @@ jobs: | |||
python -Werror::DeprecationWarning -m compileall -f -q cupy cupyx examples tests docs | |||
pushd docs | |||
pip install -r requirements.txt | |||
pip install -r rocm-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we need hipify_torch on doc build? I suppose doc building uses stub files and is independent of ROCm stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also do not need hipify_torch for the doc build since it uses stub files as you mentioned. Updated the pretest.yml to reflect this.
It seems that this is because the rocm docker image is getting larger than before (rocm-5.0). I checked container sizes that pulled rocm images (see SIZE column):
Also, according to this document, we have only 14GB disk space on GitHub-hosted runners. |
I'm not sure how we should run |
Also I will update #7964 itself to make it ready to merge, which may require our members to confirm our policy to have self-hosted runners on a public repository. |
Thank you for the contribution and all the help improving support for ROCm in CuPy! Regarding this pull request, I'm a bit concerned about the potential instability this may introduce and the negative impact it could have on both user and developer experience:
CuPy has a ROCm support that is already mature, and in my current understanding, the introduction of hipify-torch does not appear to provide significant value compared to the drawbacks mentioned above.
|
Thanks for pointing this out. As ROCm versions have evolved, the base ROCm images have grown in size. I have added a cleanup of unnecessary items that take up some runner disk space to account for this. See modification here |
Thank you for your insights @kmaehashi, While hipify-torch adds an extra installation step, it aligns us with industry practices and uses PyTorch's proven method for ROCm users. Our current ROCm support is mature, however it does not fully utilize newer ROCm capabilities. Adopting hipify-torch keeps CUDA and ROCm aligned, ensuring functional parity, and minimizes changes on the CUDA side. We hope it will make the development process streamlined, preventing it from falling behind CUDA. Thanks for addressing the user/dev experience feedback, after some thought on this I was thinking we could make the user/dev experience better by adding in the following:
I appreciate the feedback and am open to other suggestions on how to make the user/dev experience more positive with this change. |
Thanks, @takagi. Please let me know if there's anything I can do to help. I understand that some teams using self-hosted runners on public repositories have a setup where a maintainer needs to approve a pull request before CI workflows can run. This is done using the "Approve and Run" button for added security see approving-workflow-runs-from-public-forks for more information. |
Thanks, @AdrianAbeyta. Our team had a talk about "Approve and Run" button and we would not like to use it because, even on workflows using a GitHub-hosted Runner, e.g. for linting and document building, it requires manual approval on every commit. We tried it before and feel it is too cumbersome. Instead, we think of another solution. We plan to have two repositories, A (public, used for development) and A-fork (private, used for testing, with self-hosted runner registered). A test comment to a pull request on A will trigger pushing the commit from A to A-fork, then starting the CI workflow on A-fork. |
Overview
This pull request introduces the hipify utility to facilitate the conversion of CUDA C/C++/Cython code into HIP C/C++/Cython code within CuPy. Hipify offers a straightforward search-and-replace mechanism that maps CUDA constructs to their HIP equivalents, allowing CuPy to support AMD's ROCm ecosystem seamlessly. This addition particularly targets the conversion of CUDA files to HIP equivalents for CUDA/Stub builds, as detailed in the modified setup.py.
Main Changes
Integration of hipify:
The primary modification is in setup.py (see relevant section):
Determines whether to trigger the hipification process based on the presence of ROCm.
Utilizes hipify to generate HIP-compatible files by recursively converting the CUDA codebase.
Implements a custom mapping configuration, ensuring a precise mapping from CUDA constructs to HIP constructs.
Custom mapping and hipify mechanism:
Introduces a specific mapping, converting CUPY_USE_GEN_HIP_CODE to CUPY_DONT_USE_GEN_HIP_CODE.
Facilitates a mechanism for selecting the appropriate libraries only if on ROCm platforms; defaults to legacy CUDA code otherwise.
cuRAND Example:
An example integration is provided for cuRAND, demonstrating the specific mapping and library-switching mechanism between HIP and legacy CUDA versions.(see mechanism here)
Impact
This update ensures that CuPy can smoothly support both CUDA and ROCm, widening its compatibility range. It reduces manual intervention in maintaining both ecosystems and streamlines the conversion process, particularly in CUDA-to-HIP mappings. Importantly, existing CUDA code remains unchanged, ensuring compatibility for NVIDIA GPUs.