Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding preliminary AMDGPU support #99

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

leios
Copy link
Contributor

@leios leios commented Sep 12, 2022

I am still running the tests, so this PR is still a draft and might change in the next few days, but it adds preliminary support for AMDGPU devices.

There were a few small things to note:

  1. I modified every part of the code with CuArray to use an Array Type AT value instead. There were a few places (such as in zygote.jl with internal functions that were only used for the GPU. In these cases, AT became a function argument. In most cases, AT could be created without any change to the function calls, themselves.
  2. In a few cases (such as interactions/implicit_solver.jl), I added a new conditional with ROCArray instead of CuArray
  3. In a few cases, the System(...) call was modified to try and figure out what device you wanted to use if you set gpu to true, but will default to CUDA if both GPUs are present and things are left unspecified
  4. The tests were modified to have run_cuda_tests and run_rocm_tests. It will then populate the array gpu_array_types with CuArray, ROCArray, or both. I tried to modify all the tests to use the array types from this array.

Again, still messing around with things now, but thought I would put the draft PR up in the case that there are design decisions to talk about.

@jgreener64
Copy link
Collaborator

Thanks for doing this. I think this is the right idea and from a quick look the implementation seems good. Let me know when it is ready for review.

@leios
Copy link
Contributor Author

leios commented Sep 15, 2022

In principle, the PR runs now, but I am getting some scalar indexing in the tests somewhere that I need to fix. I've been noticing scalar indexing in other Molly runs for a while now, but haven't been able to pin down where they are happening or if the issue is amd-specific, so I'll keep playing around here and try to find the problem.

@leios
Copy link
Contributor Author

leios commented Sep 15, 2022

accumulateadd(x) = accumulate(+, x)

# Accumulate values in an array based on the ordered boundaries in bounds
# Used to speed up views with repeated indices on the GPU when you have the boun
ds
@views @inbounds function accumulate_bounds(arr, bounds)
    zf = zero(arr[1:1])
    accum_pad = vcat(zf, accumulateadd(arr))
    accum_bounds = accum_pad[bounds]
    accum_bounds_offset = vcat(zf, accum_bounds[1:(end - 1)])
    return accum_bounds .- accum_bounds_offset
end

It seems to be here with the accumulate function. I'll see if there's a simple solution...

@jgreener64
Copy link
Collaborator

I have noticed scalar indexing occasionally when using Molly too, though it is disallowed in the current tests so I think that case is AMD-specific.

I am not against allowing scalar indexing in specified places if performance isn't too bad. The accumulate_bounds function is a hack to allow faster AD with Zygote. It will be removed entirely later with a switch to GPU kernels. If overall performance on AMD is okay I wouldn't worry too much.

@leios
Copy link
Contributor Author

leios commented Sep 22, 2022

Since the code runs, but tests are not quite passing, I think it's best to keep this PR open and refactor it into a KernelAbstractions PR on top of the kernels branch when it is done. In the mean time, I'll just figure out the AMD-specific bugs and push them here.

@jgreener64
Copy link
Collaborator

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants