Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional backends #26

Closed
3 of 6 tasks
tzanio opened this issue Jan 3, 2018 · 14 comments
Closed
3 of 6 tasks

Additional backends #26

tzanio opened this issue Jan 3, 2018 · 14 comments

Comments

@tzanio
Copy link
Member

tzanio commented Jan 3, 2018

  • Improve OCCA backend
  • Add MFEM backend — how to support backends that don’t support JIT and don’t run on the host?
  • Add MAGMA backend?
  • Add OpenMP 4.5 backend?
  • Add pure CUDA backend?
  • Add HIP backend?
@jedbrown
Copy link
Member

jedbrown commented May 7, 2019

With the announcement that OLCF Frontier will be AMD CPU/GPU, we should try to get it into our workflow. We can use HIP (an open source CUDA-like model that can compile to CUDA and ROCm) which can be almost automatically produced from CUDA (using hipify-clang) or OpenMP-5 offload as on-node programming models. Note that HIP does not currently support run-time compilation.

HIP nominally compiles to CUDA with negligible overhead, but the toolchain needs to be installed to do so.

@tcew
Copy link

tcew commented May 7, 2019

OCCA:HIP supports run-time compilation.

@jeremylt
Copy link
Member

jeremylt commented May 7, 2019

Our OCCA backend is in serious need of a performance overhaul, so it would be great if we can also include OCCA:HIP.

@jedbrown
Copy link
Member

jedbrown commented May 7, 2019

Yes, I don't think anything special needs to be done for /gpu/occa/hip versus /gpu/occa/cuda, though the OCCA backend needs attention. My comment on run-time compilation was with regard to @YohannDudouit's native CUDA implementation.

I'm also curious about observed differences in performance characteristics between the Radeon Instinct and V100.

@tcew
Copy link

tcew commented May 7, 2019

You should follow up with Noel Chalmers. I believe he has run libP experiments with the Radeon Instinct.

@jedbrown
Copy link
Member

jedbrown commented May 7, 2019

Thanks. @noelchalmers, can you share any experiments?

@noelchalmers
Copy link
Member

Hi everyone. I'll try and chip in what I know for some of the points in this thread:

  • In addition to hipify-clang, which ports existing CUDA code by actually looking at the code's semantics, there is also hipify-perl which is a simple script which can convert CUDA codes to HIP, and at least warn about sections it is unable to translate.

  • HIP does indeed support runtime compilation in the same way CUDA does. OCCA uses analogous API calls for its runtime compilation of CUDA and HIP. I know the documentation of what is/is not currently in the HIP API is a bit sparse at the moment. The HIP Porting Guide is a good resource for the moment.

  • As for V100 vs Radeon Instinct performance, in micro-benchmarking we've been seeing bandwidth numbers in the 800-900 GB/s range for the MI-60s and similar GFLOP numbers to the PCIe V100s.

  • I don't have any readily available performance numbers for any CEED-relevant benchmarking. My plan is to resurrect the bake-off problems in libp and do some performance analysis to get a better sense of what the Radeons can do compared to the V100s. Libp's kernels rely heavily on things like shared memory bandwidth and cache performance so it will be a good exercise in finding out how portable they are to Radeon.

@jedbrown
Copy link
Member

jedbrown commented May 7, 2019

Thanks, @noelchalmers.
On run-time compilation, I don't see anything about porting NVRTC to HIP.

Are there any public clouds with Radeon Instinct (for continuous integration, etc.).

@noelchalmers
Copy link
Member

noelchalmers commented May 7, 2019

I just realized that you were referring to NVRTC when you mentioned runtime compilation.

No, HIP currently doesn't support any nvrtc* API calls. I'm not aware of any plans to add these features, but I will ask around. What HIP does support is loading compiled binaries using hipModuleLoad, which is analogous to cuModuleLoad, and finding/launching kernels from that binary.

I don't know of any public clouds I can point to using MI-25 or MI-60s yet. Maybe for some CI tests you could try compiling on some Vegas in a gpueater session? Not ideal, certainly.

@jedbrown
Copy link
Member

jedbrown commented May 7, 2019

Thanks. It looks like GPU Eater doesn't support docker-machine or Kubernetes so CI integration would be custom and/or not autoscaling, but it's something, so thanks.

@jedbrown
Copy link
Member

Yet another C++ layer, this one providing single source for CPU, OpenCL, and HIP/CUDA. https://github.com/illuhad/hipSYCL

@jedbrown
Copy link
Member

While I still don't see it on the docs website, hiprtc was apparently merged a few months ago. ROCm/HIP#1097
I thought we discussed this specifically at CEED3AM and @noelchalmers and Damon were not aware that it existed. Is it something we should be trying now, or is the lack of documentation indication that it's still in easter-egg mode?

@jdahm jdahm removed their assignment Jun 9, 2020
@dmed256 dmed256 removed their assignment May 25, 2022
@jedbrown
Copy link
Member

jedbrown commented Sep 6, 2022

I'll close this open-ended issue. There is an improved occa backend coming in #1043. I think at this point we can make new issues for specific backend requests.

@jedbrown jedbrown closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants