New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Error handling variables not added to the @llvm.used
list
#9526
Labels
Comments
gmarkall
added a commit
to gmarkall/numba
that referenced
this issue
Apr 8, 2024
Previous commits added support for compiling Python functions to CUDA LTO-IR via the compilation interfaces. This commit adds stub code for supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The only functional change, unused in Numba at present, is that if the linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR instead of PTX, and passes that to the linker. The `lto` attribute is added linker classes in `numba.cuda.cudadrv.driver` - this is always `False` for the built-in linkers, but a linker from pynvjitlink (or any other external linker, in theory) could set it to `True` to signal that LTO is enabled. Some tests must be skipped if LTO is enabled, because it becomes difficult to use the functionality they test when LTO is enabled: - Some inspect the PTX, which is difficult to do when LTO-IR is generated instead. - Others check for exceptions, but the exception flags get optimized away by LTO because Numba fails to add them to the used list (See numba#9526).
gmarkall
added a commit
to gmarkall/numba
that referenced
this issue
Apr 8, 2024
Previous commits added support for compiling Python functions to CUDA LTO-IR via the compilation interfaces. This commit adds stub code for supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The only functional change, unused in Numba at present, is that if the linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR instead of PTX, and passes that to the linker. The `lto` attribute is added linker classes in `numba.cuda.cudadrv.driver` - this is always `False` for the built-in linkers, but a linker from pynvjitlink (or any other external linker, in theory) could set it to `True` to signal that LTO is enabled. Some tests must be skipped if LTO is enabled, because it becomes difficult to use the functionality they test when LTO is enabled: - Some inspect the PTX, which is difficult to do when LTO-IR is generated instead. - Others check for exceptions, but the exception flags get optimized away by LTO because Numba fails to add them to the used list (See numba#9526).
gmarkall
added a commit
to gmarkall/numba
that referenced
this issue
Apr 24, 2024
- Wording edits to docs on CUDA compilation. - Check for `if cc is not None` rather than just `if cc`, etc., in the codegen, for greater robustness. - Add a test that checks the error reported when specifying an illegal output kind. - Cross-reference numba#9526 in the comment in `TestUserUxc`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
#9267 fixed dropping of kernels by
pynvjitlink
by adding kernels to the@llvm.used
list.We also add global variables for representing an error handling state:
numba/numba/cuda/target.py
Lines 219 to 224 in 03f2722
The variables seem to get optimized away when LTO is used with
pynvjitlink
, and I suspect they should also be added to the@llvm.used
list to prevent them being optimized away - from the perspective of device code, they are only ever written to, so they look un-needed - it's only the host that looks up their values after kernel execution.The text was updated successfully, but these errors were encountered: