CUDA: Add support for compilation to LTO-IR #9274

gmarkall · 2023-11-08T12:51:19Z

This adds support for compiling to LTO-IR, providing an alternative route to using PTX for linking code with non-Python source code with greater potential for optimization from being able to optimize at link-time across the whole body of source for different languages.

A summary of the changes:

The first three commits are small refactors / tidy-ups of some paths that could be simplified now we don't have to deal with NVVM 3.4 anymore (556bb82, b6c3b85, 6d8a22c)
The next commit clarifies the compilation behaviour with / without return types being specified, which was a bit of a hole in the documentation: 357d67d
Then we add support for LTO-IR to to CUDA codegen, without exposing it externally: 8d6fd90
Then we add a public interface for LTO-IR code generation: ed99b9e
Documentation is also added / updated: 0938da4

gmarkall · 2023-11-08T12:51:39Z

gpuci run tests

gmarkall · 2023-11-08T12:53:16Z

gpuci run tests

gmarkall · 2023-11-08T16:49:25Z

gpuci run tests

gmarkall · 2023-11-09T10:08:08Z

gpuci run tests

This is a bit clearer about what it does, and will be a more representative name when compilation to LTO-IR is also supported. We also rename the buffer held by `CompilationUnit.compile()`, because the buffer will no longer be limited to holding PTX only.

We never have multiple PTX outputs anymore (this was only necessary with NVVM 3.4), there's no need to make lists of them or join them

This addition explicitly states the behaviour when a return type is or is not supplied as part of the signature; previously the user would have had to guess this, or discover it through accident / experiment.

This follows a very similar process to PTX compilation - LTO generation is enabled with NVVM's `-gen-lto` flag.

We create a more generic function, `numba.cuda.compile()`, that provides similar functionality to `compile_ptx()`, but allowing the choice of PTX or LTO-IR output. This function defaults to the C ABI rather than the Numba one, as this is expected to be more convenient for most use cases. We also add a variant to target the current device. The original `compile_ptx()` and variant for the current device are left in to support existing use cases that use them and expect generated code to use the Numba ABI.

gmarkall · 2023-12-05T16:13:51Z

gpuci run tests

gmarkall · 2024-04-03T11:54:09Z

gpuci run tests

gmarkall · 2024-04-08T16:18:44Z

gpuci run tests

Previous commits added support for compiling Python functions to CUDA LTO-IR via the compilation interfaces. This commit adds stub code for supporting compilation of `@cuda.jit`-decorated functions to LTO-IR. The only functional change, unused in Numba at present, is that if the linker has LTO enabled, the CUDA codegen uses NVVM to generate LTO-IR instead of PTX, and passes that to the linker. The `lto` attribute is added linker classes in `numba.cuda.cudadrv.driver` - this is always `False` for the built-in linkers, but a linker from pynvjitlink (or any other external linker, in theory) could set it to `True` to signal that LTO is enabled. Some tests must be skipped if LTO is enabled, because it becomes difficult to use the functionality they test when LTO is enabled: - Some inspect the PTX, which is difficult to do when LTO-IR is generated instead. - Others check for exceptions, but the exception flags get optimized away by LTO because Numba fails to add them to the used list (See numba#9526).

gmarkall · 2024-04-08T17:11:50Z

gpuci run tests

This is needed to allow the "skip under LTO" test functionality to run successfully (and not skip on the simulator, since it does not simulate LTO).

gmarkall · 2024-04-08T18:53:35Z

gpuci run tests

- `compile_for_current_device()` needs an `output` kwarg so it can generate LTO-IR or PTX. - `compile_ptx()` now calls `compile()` with an explicit `output` kwarg so that it compiles to PTX even if the default for `compile()` changes in future. - `compile_ptx_for_current_device()` now calls `compile_ptx()` with the CC for the current device.

This is just implementing `get_asm_str()` now, which is part of the codegen object's interface. To align better with the rest of Numba, the `_get_ptx()` body is moved into `get_asm_str()` and `get_asm_str()` is used in its place.

gmarkall · 2024-04-17T11:46:06Z

gpuci run tests

stuartarchibald

Thanks for the patch @gmarkall, it's great to see this feature implemented. This has been through an OOB pair review between us already in which the feature, expectations and implementation were discussed. The review provided below is just catching a few small things in the resultant change set, the contents otherwise is good. Thanks again for working on this!

numba/cuda/codegen.py

numba/cuda/compiler.py

docs/source/cuda/cuda_compilation.rst

numba/cuda/codegen.py

numba/cuda/tests/cudapy/test_userexc.py

numba/cuda/tests/cudapy/test_compiler.py

- Wording edits to docs on CUDA compilation. - Check for `if cc is not None` rather than just `if cc`, etc., in the codegen, for greater robustness. - Add a test that checks the error reported when specifying an illegal output kind. - Cross-reference numba#9526 in the comment in `TestUserUxc`.

gmarkall · 2024-04-24T13:08:47Z

gpuci run tests

gmarkall · 2024-04-24T13:09:11Z

@stuartarchibald Many thanks for the review - I believe all comments are now addressed, and I'm just waiting on CI.

stuartarchibald

Thanks for the patch and fixes!

gmarkall added 2 - In Progress CUDA CUDA related issue/PR Effort - medium Medium size effort needed labels Nov 8, 2023

gmarkall added a commit to gmarkall/numba that referenced this pull request Nov 8, 2023

Add release note for numba#9274

c4011b5

gmarkall force-pushed the cuda-ltoir branch from 9bbe644 to f932dae Compare November 9, 2023 08:58

gmarkall added 3 - Ready for Review and removed 2 - In Progress labels Nov 13, 2023

gmarkall added 10 commits December 5, 2023 16:13

CUDA codegen: refactor common CC ensuring functionality

57410d1

CUDA codegen refactor: We only generate one PTX

ec74a39

We never have multiple PTX outputs anymore (this was only necessary with NVVM 3.4), there's no need to make lists of them or join them

compile_ptx() docs: Explain return type behaviour

e146972

This addition explicitly states the behaviour when a return type is or is not supplied as part of the signature; previously the user would have had to guess this, or discover it through accident / experiment.

CUDA codegen: add support for compilation to LTO-IR

5adfbc0

This follows a very similar process to PTX compilation - LTO generation is enabled with NVVM's `-gen-lto` flag.

CUDA: Document compilation to LTO-IR

40d8c98

Add release note for numba#9274

6734671

CUDA: Add dummy compile APIs to simulator

30109ea

Skip LTO-IR test when toolkit < 11.5

0d9a729

gmarkall force-pushed the cuda-ltoir branch from f932dae to 0d9a729 Compare December 5, 2023 16:13

leofang mentioned this pull request Mar 15, 2024

Support for CUFFT callbacks JuliaGPU/CUDA.jl#75

Open

gmarkall added 2 commits April 3, 2024 12:16

Merge remote-tracking branch 'numba/main' into cuda-ltoir

86000ce

Fix formatting in PR numba#9274 notes

1086df4

gmarkall force-pushed the cuda-ltoir branch from 75f0d81 to 6286b22 Compare April 8, 2024 17:11

Add simulator stubs for Linker LTO stub functionality

58d87fc

This is needed to allow the "skip under LTO" test functionality to run successfully (and not skip on the simulator, since it does not simulate LTO).

gmarkall added this to the 0.60.0-rc1 milestone Apr 9, 2024

isVoid mentioned this pull request Apr 16, 2024

LTO Support NVIDIA/numbast#33

Open

gmarkall added 2 commits April 17, 2024 12:43

CUDA: refactor codegen to remove _get_ptx()

eb04196

This is just implementing `get_asm_str()` now, which is part of the codegen object's interface. To align better with the rest of Numba, the `_get_ptx()` body is moved into `get_asm_str()` and `get_asm_str()` is used in its place.

stuartarchibald reviewed Apr 24, 2024

View reviewed changes

stuartarchibald added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review labels Apr 24, 2024

gmarkall added 2 commits April 24, 2024 13:51

Assert correct return type in

1dd95c4

gmarkall added 4 - Waiting on CI Review etc done, waiting for CI to finish and removed 4 - Waiting on author Waiting for author to respond to review labels Apr 24, 2024

stuartarchibald approved these changes Apr 24, 2024

View reviewed changes

stuartarchibald added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on CI Review etc done, waiting for CI to finish labels Apr 24, 2024

stuartarchibald mentioned this pull request Apr 24, 2024

Numba 0.60.0rc1 Checklist #9544

Open

40 tasks

sklam merged commit 6bf8b6a into numba:main Apr 24, 2024
22 checks passed

gmarkall deleted the cuda-ltoir branch May 2, 2024 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Add support for compilation to LTO-IR #9274

CUDA: Add support for compilation to LTO-IR #9274

gmarkall commented Nov 8, 2023 •

edited

gmarkall commented Nov 8, 2023

gmarkall commented Nov 8, 2023

gmarkall commented Nov 8, 2023

gmarkall commented Nov 9, 2023

gmarkall commented Dec 5, 2023

gmarkall commented Apr 3, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 17, 2024

stuartarchibald left a comment

gmarkall commented Apr 24, 2024

gmarkall commented Apr 24, 2024

stuartarchibald left a comment

CUDA: Add support for compilation to LTO-IR #9274

CUDA: Add support for compilation to LTO-IR #9274

Conversation

gmarkall commented Nov 8, 2023 • edited

gmarkall commented Nov 8, 2023

gmarkall commented Nov 8, 2023

gmarkall commented Nov 8, 2023

gmarkall commented Nov 9, 2023

gmarkall commented Dec 5, 2023

gmarkall commented Apr 3, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 8, 2024

gmarkall commented Apr 17, 2024

stuartarchibald left a comment

Choose a reason for hiding this comment

gmarkall commented Apr 24, 2024

gmarkall commented Apr 24, 2024

stuartarchibald left a comment

Choose a reason for hiding this comment

gmarkall commented Nov 8, 2023 •

edited