Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Are cuModules shared between kernels from same program #61

Open
mondus opened this issue Apr 17, 2020 · 7 comments
Open

Question: Are cuModules shared between kernels from same program #61

mondus opened this issue Apr 17, 2020 · 7 comments

Comments

@mondus
Copy link
Contributor

mondus commented Apr 17, 2020

I.e. If I create mutiple jitify kernels from the same program which have a shared device symbol does get_global_ptr return the same address for each?

Would be good to know before I do some refactoring of some code.

@benbarsdell
Copy link
Member

No, currently they are not shared, each kernel instantiation has its own cuModule, so the addresses will be different (I confirmed with a test).

This is arguably a design flaw in the Jitify API, and I'd been wondering if/when it would become a problem. I'd be interested to know how important it is for your application.

A (hypothetical) new Jitify API that better matched the underlying CUDA APIs would allow (/require) you to provide multiple name expressions for a single program (e.g., template instantiations of multiple kernels, globals etc.), then compile it once to a single module and extract all of the kernels and global addresses. This is doable, but would take a bit of refactoring and would be a slightly less intuitive API for common use-cases. Let us know if you think something like this would be of value.

@mondus
Copy link
Contributor Author

mondus commented Apr 20, 2020

Thanks for the reply @benbarsdell. This is certainly an issue for us, particularly when it comes to constant memory. We have a number of large constant and statically sized device symbols which we can compile within the same unit but which need to be accessed by separate kernels in the same compilation unit. Your suggestion would be very helpful for our use case but also for any use case where there are multiple kernels in the same compilation unit. Would it not be possible to simply change the internals so that the cuModule was created by the program and shared with each kernel object?

We can work around the device symbols but I cant see a clear way to work around our use of constant memory. Although I am unclear if the constant memory limitations are per module/context/device.

@maddyscientist
Copy link
Collaborator

For the constants, could this be a good use jitify's new found linking ability: declare the __constant__ in the offline source code, e.g., in a .cu file and JIT compile the kernel and link against that object file?

@mondus
Copy link
Contributor Author

mondus commented Apr 21, 2020

@maddyscientist Yes this might work so long as you can link multiple kernels against the same module (containing the constant definition). Presumably this is fine as they are in the same context?

@benbarsdell
Copy link
Member

I think linking will have the same issue because there will still be multiple modules, unless I'm misunderstanding.

Would it not be possible to simply change the internals so that the cuModule was created by the program and shared with each kernel object?

The problem is that we currently have:

program.kernel(name).instantiate(template args).launch(...)

but what we would need is (roughly speaking):

program.instantiate(list of name expressions).kernel(name expression).launch(...).

In particular, the call to instantiate() is when the program gets compiled. Changing that means changing the fundamental flow of the API. This is doable, but not a small change.

@mondus
Copy link
Contributor Author

mondus commented Apr 21, 2020

@benbarsdell Yes I imagine that you are right as after linking there would be multiple modules with duplicate definitions of the constant. To set the constant value would require doing this for each instaciation. I see now how this would be a significant change (but one which I would very much support!). Could you support both options? E.g.

program.instantiate_program(list of name expressions).instanciated_kernel(name expression).launch(...)

Supposedly this would then support things like.

program.get_global_ptr(...)

Which would solve all of my problems...

What I am currently still unclear on is how constant memory is allocate don the device. The following SO question points to the ISA docs suggesting "There is an additional 640 KB of constant memory, organized as ten independent 64 KB regions. The driver may allocate and initialize constant buffers in these regions and pass pointers to the buffers as kernel function parameters.". Does this mean I could have a maximum of 10 jiffy kernels/modules each using 64KB of constant space, or could I have any number and some driver magic would take take or mapping these to regions at kernel launch?

@mondus
Copy link
Contributor Author

mondus commented Apr 30, 2020

@benbarsdell We have a work around for this for now but it would be a nice feature to enable instantiation of multiple kernels from the same module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants