Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port GPUMD to HIP? #404

Open
Dankomaister opened this issue Apr 7, 2023 · 9 comments
Open

Port GPUMD to HIP? #404

Dankomaister opened this issue Apr 7, 2023 · 9 comments

Comments

@Dankomaister
Copy link

Many of the latest HPC systems, such as Frontier, LUMI, Adastra, and Dardel, are equipped with AMD GPUs. Therefore, it would be great if GPUMD could run on both NVIDIA and AMD GPUs. Thus, are there any plans or ongoing efforts to port GPUMD to AMD's Heterogeneous Interface for Portability (HIP)? It appears that only minimal changes are required to port a CUDA application to HIP without any loss in performance.

@brucefan1983
Copy link
Owner

This is a good question. Currently I don't have access to any HIP system. Some callaborator has proposed to port GPUMD to AMD in LUMI and if that is granted, I may get access to HIP and try it out.

@Dankomaister
Copy link
Author

Okay, sounds good!
We have access to systems with AMD GPUs (Dardel) if I have time I might do some tests to see how much work is need for a HIP port.

@Dankomaister
Copy link
Author

Hi,

I made a first attempt to port the code from CUDA to HIP.

After cloning the GPUMD git repository (v3.8) and loading the appropriate ROCm module. Porting starts by first converting all CUDA calls to HIP calls. This is done automatically using hipconvertinplace-perl.sh according to the HIP Porting Guide

hipconvertinplace-perl.sh GPUMD/

While this does most of the work, some manual changes to the code are needed, which I will list below.

Since HIP currently does not support warp sync all the __syncwarp() statements must be changed. Not sure what is the best approach here, but replacing __syncwarp() with __syncthreads() seems to work, this may have a negative impact on performance?

Furthermore, some changes are need to the parts of the code which relies on prefix-sums from the thrust library. While this library as been ported to HIP there are some changes needed. Specifically thrust::cuda::par.on(stream) does not exist in the HIP version of the library. Again not sure what is the best approach but changing thrust::cuda::par.on(stream) to thrust::host seems to work.

Finally, the cusolver_wrapper.cu code had to be manually ported since for some reason the hipconvertinplace-perl.sh script could not handle this.

There were probably a few more minor changes that needed to be made to get the code to compile which I may have forgotten to mention. However, this diff file cu_to_hip.patch contains all the changes required to port the code to HIP.

To compile one simply have to change the make file to use hipcc instead of nvcc

###########################################################
# some flags
###########################################################
CC = hipcc
CFLAGS = -std=c++14 -O3
INC = -I./
LDFLAGS = 
LIBS = -lhipblas -lhipsolver

After the compiling I tested both training with nep and MD with gpumd which both worked fine without any problems. This is very promising for moving the GPUMD code from CUDA to HIP! It is also worth noting that one does not need access to an AMD GPU to develop HIP code since hipcc can also compile code for Nvidia GPUs!

@brucefan1983
Copy link
Owner

This is a great news to hear about during my holiday.

I didn't know it can be so "easy" to port from CUDA to HIP for the whole GPUMD.

Replacing __syncwarp() with __syncthreads() does not affect the performance much, because the functions with this kind of syncrhonization are by no means a bottleneck. I can even remove all __syncwarp() to simiplify future work.

changing thrust::cuda::par.on(stream) to thrust::host might affect the performance noticeably, and tests are needed.

Perhaps it is better to change the CUDA part such that there is little or no manual tuning after the automatic conversion. Then we can always develop the CUDA version and make an HIP one for each realease in the future.

How to you think?

@Dankomaister
Copy link
Author

I think it might be a good idea to have a separate HIP version of the GPUMD code to work on the CUDA part until porting it to HIP is seamless.
Then more testing can be done to ensure that the HIP version is not significantly slower than the CUDA version, after that maybe switching to HIP is a good idea?

I wanted to work a bit more with the HIP port and make some benchmarks to compare with the CUDA version but unfortunately I haven't had time to do this.

@brucefan1983
Copy link
Owner

Thanks for your work on this direction. I also hope to have a look when I have the tools.

@njzjz
Copy link

njzjz commented Sep 29, 2023

It is a pain to maintain two similar codes, so I suggest GPUMD merge CUDA and HIP codes into the same files. I have done it for DeePMD-kit; see deepmodeling/deepmd-kit#2838.

@brucefan1983
Copy link
Owner

Yes, this should be done after the hip version is working.

@Knight-WP
Copy link

Hi,

I made a first attempt to port the code from CUDA to HIP.

After cloning the GPUMD git repository (v3.8) and loading the appropriate ROCm module. Porting starts by first converting all CUDA calls to HIP calls. This is done automatically using hipconvertinplace-perl.sh according to the HIP Porting Guide

hipconvertinplace-perl.sh GPUMD/

While this does most of the work, some manual changes to the code are needed, which I will list below.

Since HIP currently does not support warp sync all the __syncwarp() statements must be changed. Not sure what is the best approach here, but replacing __syncwarp() with __syncthreads() seems to work, this may have a negative impact on performance?

Furthermore, some changes are need to the parts of the code which relies on prefix-sums from the thrust library. While this library as been ported to HIP there are some changes needed. Specifically thrust::cuda::par.on(stream) does not exist in the HIP version of the library. Again not sure what is the best approach but changing thrust::cuda::par.on(stream) to thrust::host seems to work.

Finally, the cusolver_wrapper.cu code had to be manually ported since for some reason the hipconvertinplace-perl.sh script could not handle this.

There were probably a few more minor changes that needed to be made to get the code to compile which I may have forgotten to mention. However, this diff file cu_to_hip.patch contains all the changes required to port the code to HIP.

To compile one simply have to change the make file to use hipcc instead of nvcc

###########################################################
# some flags
###########################################################
CC = hipcc
CFLAGS = -std=c++14 -O3
INC = -I./
LDFLAGS = 
LIBS = -lhipblas -lhipsolver

After the compiling I tested both training with nep and MD with gpumd which both worked fine without any problems. This is very promising for moving the GPUMD code from CUDA to HIP! It is also worth noting that one does not need access to an AMD GPU to develop HIP code since hipcc can also compile code for Nvidia GPUs!

Hello, have you tried running on multiple cards? It runs normally on a single AMD card for me, but multiple card runs are interrupted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants