Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda interface mallocs and pybind11 #118

Open
SBresler opened this issue Dec 25, 2022 · 0 comments
Open

Cuda interface mallocs and pybind11 #118

SBresler opened this issue Dec 25, 2022 · 0 comments

Comments

@SBresler
Copy link

SBresler commented Dec 25, 2022

I have gotten the cuda interface to work in python now, and I am seeing a lot of memory allocations when I call the dll. It's way faster than what I was doing before.

I was messing around with the Nvidia Performance Primitives, and many of these functions require a scratch buffer that is pre allocated - So I can just pass an already allocated array to the NPP dll, and then all of the memory is set up.

So what I am asking is the following:

Am I correct in assuming that the cuda interface does not have pointers for all of the things that it needs to do the calculation?

Is there theoretically a way to change the interface so that you can precompute the amount of "scratch buffer" needed for gpufit, and then eliminate some of the memory allocations?

looking at the traces I am getting about 50% of a call to gpufit's cuda interface as being memory allocations, while with the NPP stuff I never see any memory allocations... because i give it a preallocated scratch buffer.

I'm willing to look into this myself and contribute; it's just helpful to hear thoughts on this.

EDIT: oh, pybind11. Are there any performance advantages to using pybind11 instead of ctypes? I just used ctypes but I was interested in the idea that pybind11 requires you to modify cpp source, so I would assume the integration is a little bit tighter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant