Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA: Add support for compilation to LTO-IR #9274

Merged
merged 18 commits into from Apr 24, 2024
Merged

Conversation

gmarkall
Copy link
Member

@gmarkall gmarkall commented Nov 8, 2023

This adds support for compiling to LTO-IR, providing an alternative route to using PTX for linking code with non-Python source code with greater potential for optimization from being able to optimize at link-time across the whole body of source for different languages.

A summary of the changes:

  • The first three commits are small refactors / tidy-ups of some paths that could be simplified now we don't have to deal with NVVM 3.4 anymore (556bb82, b6c3b85, 6d8a22c)
  • The next commit clarifies the compilation behaviour with / without return types being specified, which was a bit of a hole in the documentation: 357d67d
  • Then we add support for LTO-IR to to CUDA codegen, without exposing it externally: 8d6fd90
  • Then we add a public interface for LTO-IR code generation: ed99b9e
  • Documentation is also added / updated: 0938da4

@gmarkall gmarkall added 2 - In Progress CUDA CUDA related issue/PR Effort - medium Medium size effort needed labels Nov 8, 2023
@gmarkall
Copy link
Member Author

gmarkall commented Nov 8, 2023

gpuci run tests

gmarkall added a commit to gmarkall/numba that referenced this pull request Nov 8, 2023
@gmarkall
Copy link
Member Author

gmarkall commented Nov 8, 2023

gpuci run tests

1 similar comment
@gmarkall
Copy link
Member Author

gmarkall commented Nov 8, 2023

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Nov 9, 2023

gpuci run tests

This is a bit clearer about what it does, and will be a more
representative name when compilation to LTO-IR is also supported.

We also rename the buffer held by `CompilationUnit.compile()`, because
the buffer will no longer be limited to holding PTX only.
We never have multiple PTX outputs anymore (this was only necessary with
NVVM 3.4), there's no need to make lists of them or join them
This addition explicitly states the behaviour when a return type is or
is not supplied as part of the signature; previously the user would have
had to guess this, or discover it through accident / experiment.
This follows a very similar process to PTX compilation - LTO generation
is enabled with NVVM's `-gen-lto` flag.
We create a more generic function, `numba.cuda.compile()`, that provides
similar functionality to `compile_ptx()`, but allowing the choice of PTX
or LTO-IR output. This function defaults to the C ABI rather than the
Numba one, as this is expected to be more convenient for most use cases.
We also add a variant to target the current device.

The original `compile_ptx()` and variant for the current device are left
in to support existing use cases that use them and expect generated
code to use the Numba ABI.
@gmarkall
Copy link
Member Author

gmarkall commented Dec 5, 2023

gpuci run tests

@gmarkall
Copy link
Member Author

gmarkall commented Apr 3, 2024

gpuci run tests

1 similar comment
@gmarkall
Copy link
Member Author

gmarkall commented Apr 8, 2024

gpuci run tests

Previous commits added support for compiling Python functions to CUDA
LTO-IR via the compilation interfaces. This commit adds stub code for
supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The
only functional change, unused in Numba at present, is that if the
linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR
instead of PTX, and passes that to the linker.

The `lto` attribute is added linker classes in
`numba.cuda.cudadrv.driver` - this is always `False` for the built-in
linkers, but a linker from pynvjitlink (or any other external linker, in
theory) could set it to `True` to signal that LTO is enabled.

Some tests must be skipped if LTO is enabled, because it becomes
difficult to use the functionality they test when LTO is enabled:

- Some inspect the PTX, which is difficult to do when LTO-IR is
  generated instead.
- Others check for exceptions, but the exception flags get optimized
  away by LTO because Numba fails to add them to the used list (See
  numba#9526).
@gmarkall
Copy link
Member Author

gmarkall commented Apr 8, 2024

gpuci run tests

This is needed to allow the "skip under LTO" test functionality to run
successfully (and not skip on the simulator, since it does not simulate
LTO).
@gmarkall
Copy link
Member Author

gmarkall commented Apr 8, 2024

gpuci run tests

@gmarkall gmarkall added this to the 0.60.0-rc1 milestone Apr 9, 2024
- `compile_for_current_device()` needs an `output` kwarg so it can
  generate LTO-IR or PTX.
- `compile_ptx()` now calls `compile()` with an explicit `output` kwarg
  so that it compiles to PTX even if the default for `compile()` changes
  in future.
- `compile_ptx_for_current_device()` now calls `compile_ptx()` with the
  CC for the current device.
This is just implementing `get_asm_str()` now, which is part of the
codegen object's interface. To align better with the rest of Numba,
the `_get_ptx()` body is moved into `get_asm_str()` and `get_asm_str()`
is used in its place.
@gmarkall
Copy link
Member Author

gpuci run tests

Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @gmarkall, it's great to see this feature implemented. This has been through an OOB pair review between us already in which the feature, expectations and implementation were discussed. The review provided below is just catching a few small things in the resultant change set, the contents otherwise is good. Thanks again for working on this!

numba/cuda/codegen.py Outdated Show resolved Hide resolved
numba/cuda/codegen.py Outdated Show resolved Hide resolved
numba/cuda/compiler.py Show resolved Hide resolved
docs/source/cuda/cuda_compilation.rst Outdated Show resolved Hide resolved
docs/source/cuda/cuda_compilation.rst Outdated Show resolved Hide resolved
numba/cuda/codegen.py Show resolved Hide resolved
numba/cuda/tests/cudapy/test_userexc.py Show resolved Hide resolved
numba/cuda/tests/cudapy/test_compiler.py Show resolved Hide resolved
numba/cuda/tests/cudapy/test_compiler.py Show resolved Hide resolved
numba/cuda/tests/cudapy/test_compiler.py Show resolved Hide resolved
@stuartarchibald stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 24, 2024
- Wording edits to docs on CUDA compilation.
- Check for `if cc is not None` rather than just `if cc`, etc., in the
  codegen, for greater robustness.
- Add a test that checks the error reported when specifying an illegal
  output kind.
- Cross-reference numba#9526 in the comment in `TestUserUxc`.
@gmarkall
Copy link
Member Author

gpuci run tests

@gmarkall
Copy link
Member Author

@stuartarchibald Many thanks for the review - I believe all comments are now addressed, and I'm just waiting on CI.

@gmarkall gmarkall added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 24, 2024
Copy link
Contributor

@stuartarchibald stuartarchibald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch and fixes!

@stuartarchibald stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Apr 24, 2024
@stuartarchibald stuartarchibald mentioned this pull request Apr 24, 2024
40 tasks
@sklam sklam merged commit 6bf8b6a into numba:main Apr 24, 2024
22 checks passed
@gmarkall gmarkall deleted the cuda-ltoir branch May 2, 2024 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge CUDA CUDA related issue/PR Effort - medium Medium size effort needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants