Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Error handling variables not added to the @llvm.used list #9526

Open
gmarkall opened this issue Apr 8, 2024 · 0 comments
Open

CUDA: Error handling variables not added to the @llvm.used list #9526

gmarkall opened this issue Apr 8, 2024 · 0 comments
Labels
bug - miscompile Bugs: miscompile CUDA CUDA related issue/PR

Comments

@gmarkall
Copy link
Member

gmarkall commented Apr 8, 2024

#9267 fixed dropping of kernels by pynvjitlink by adding kernels to the @llvm.used list.

We also add global variables for representing an error handling state:

numba/numba/cuda/target.py

Lines 219 to 224 in 03f2722

def define_error_gv(postfix):
name = wrapfn.name + postfix
gv = cgutils.add_global_variable(wrapper_module, ir.IntType(32),
name)
gv.initializer = ir.Constant(gv.type.pointee, None)
return gv

The variables seem to get optimized away when LTO is used with pynvjitlink, and I suspect they should also be added to the @llvm.used list to prevent them being optimized away - from the perspective of device code, they are only ever written to, so they look un-needed - it's only the host that looks up their values after kernel execution.

@gmarkall gmarkall added CUDA CUDA related issue/PR bug - miscompile Bugs: miscompile labels Apr 8, 2024
gmarkall added a commit to gmarkall/numba that referenced this issue Apr 8, 2024
Previous commits added support for compiling Python functions to CUDA
LTO-IR via the compilation interfaces. This commit adds stub code for
supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The
only functional change, unused in Numba at present, is that if the
linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR
instead of PTX, and passes that to the linker.

The `lto` attribute is added linker classes in
`numba.cuda.cudadrv.driver` - this is always `False` for the built-in
linkers, but a linker from pynvjitlink (or any other external linker, in
theory) could set it to `True` to signal that LTO is enabled.

Some tests must be skipped if LTO is enabled, because it becomes
difficult to use the functionality they test when LTO is enabled:

- Some inspect the PTX, which is difficult to do when LTO-IR is
  generated instead.
- Others check for exceptions, but the exception flags get optimized
  away by LTO because Numba fails to add them to the used list (See
  numba#9526).
gmarkall added a commit to gmarkall/numba that referenced this issue Apr 8, 2024
Previous commits added support for compiling Python functions to CUDA
LTO-IR via the compilation interfaces. This commit adds stub code for
supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The
only functional change, unused in Numba at present, is that if the
linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR
instead of PTX, and passes that to the linker.

The `lto` attribute is added linker classes in
`numba.cuda.cudadrv.driver` - this is always `False` for the built-in
linkers, but a linker from pynvjitlink (or any other external linker, in
theory) could set it to `True` to signal that LTO is enabled.

Some tests must be skipped if LTO is enabled, because it becomes
difficult to use the functionality they test when LTO is enabled:

- Some inspect the PTX, which is difficult to do when LTO-IR is
  generated instead.
- Others check for exceptions, but the exception flags get optimized
  away by LTO because Numba fails to add them to the used list (See
  numba#9526).
gmarkall added a commit to gmarkall/numba that referenced this issue Apr 24, 2024
- Wording edits to docs on CUDA compilation.
- Check for `if cc is not None` rather than just `if cc`, etc., in the
  codegen, for greater robustness.
- Add a test that checks the error reported when specifying an illegal
  output kind.
- Cross-reference numba#9526 in the comment in `TestUserUxc`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - miscompile Bugs: miscompile CUDA CUDA related issue/PR
Projects
None yet
Development

No branches or pull requests

1 participant