Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of GPU support #431

Open
simsurace opened this issue Jan 28, 2022 · 3 comments
Open

State of GPU support #431

simsurace opened this issue Jan 28, 2022 · 3 comments

Comments

@simsurace
Copy link
Member

I wanted to ask for an overview of the current state of GPU support of this package. It appears as though there are several issues related to whether the package works nicely on the GPU (i.e. with GPU arrays as inputs) and several proposed solutions, but getting KernelFunctions.jl to work on the GPU seems to be delayed by other things breaking, like AD.

I was wondering whether a clear path forward is already emerging. Since I've done some GPU work before, I'd be happy to help getting this package work on the GPU.

For what I mean by GPU support, at least the following should be possible:

using CUDA, KernelFunctions
CUDA.allowscalar(false)
x = CUDA.rand(16)
k = SEKernel()
kernelmatrix(k, x)
@devmotion
Copy link
Member

There are already some ongoing work, open issues, and corresponding PRs regarding GPU support. Some issues/tasks are listed also in JuliaGaussianProcesses/ApproximateGPs.jl#15. Relevant in KernelFunctions are e.g., #299, #380, and the linked PRs. Unfortunately, I was busy with other stuff but I'll try to take up and focus on #397 in the next weeks.

@simsurace
Copy link
Member Author

simsurace commented Jan 31, 2022

Thanks! From (superficially) reading the issues and PRs (also e.g. #386), there seem to be some roadblocks which prevent things from progressing. Or is it just lack of time/resources at this point? For example, is #386 the preferred direction for making kernel evaluation/operations on the GPU possible, and do the AD issues with that approach seem resolvable (I'm not very familiar with the entirety of the AD landscape and did not yet go into the details of that conversation, but it seems quite hard), or should we explore alternative ways for making that happen (KernelAbstractions, custom CUDA kernels, whatnot)? I'm interested in helping move this forward, but I'm not sure where it's more effective to invest time.

On a related note, I already did some easy fixes to prevent scalar indexing that was being triggered when running the optimizer in this minimum working example, mainly while evaluating AbstractGPs._compute_intermediates, such as this cholesky on diagonal matrices issue or this thing with symmetric matrices. There are a few more things along those lines to hunt down.

@theogf
Copy link
Member

theogf commented Jan 31, 2022

Only speaking for #386 it is both a matter of time and of choices. There is unfortunately no silver bullet to deal with AD, GPU and co while keeping an optimal performance.
I still think #386 is the solution though (with some decisions to be made on what the generic fallback should be)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants