Provenance of NVTX headers in NCCL #1270

Artem-B · 2024-04-29T22:24:36Z

NCCL appears to carry some sort of fork of nvtx3 headers that match neither the headers in https://github.com/NVIDIA/nvtx nor the nvtx3 headers distributed with CUDA-12.x. Unfortunately, NCCL commit history does not mention the provenance of these headers, so it's not clear if they are a snapshot of some WIP branch in NVIDIA's internal repo, or if it's something developed by/for NCCL.

My issue is that when I build an application with CUDA-12, I generally need to compile everything with one set of those headers. Debugging ODR violations caused by different parts of the build including different variants of the headers is not fun.

Would it be possible to switch NCCL to use pristine upstream NVTX headers, and either keep NCCL-specific extensions only, or upstream the changes if they are not NCCL-specific?

@jrhemstad, @sjeaugey

The text was updated successfully, but these errors were encountered:

kiskra-nvidia · 2024-04-29T22:46:44Z

We are using a snapshot of the headers from an internal repo. Funny you should ask though -- I've been pushing on the parties involved to get them in shape for the last few weeks, as they've been causing compilation warnings, especially with clang. Let me ping them again...

So I'm hoping that the next NCCL release will have an updated version, but it won't be the same as what's in the current CUDA release.

Artem-B · 2024-04-29T23:26:04Z

I can help with clang-related issues. If the warnings are related to C++, clang is usually more strict than nvcc, and usually has a good reason to complain.

So I'm hoping that the next NCCL release will have an updated version, but it won't be the same as what's in the current CUDA release.

It does not have to be the same, as long as they can be used instead of the headers in CUDA-12. It would work for us short-term, as long as we have a variant that works for everyone.

Long term I would expect to see problems if/when NCCL's copy and NVTX repo and/or the version shipped with CUDA start to diverge. Shipping a fork as part of another library is a potential timebomb.

kiskra-nvidia · 2024-04-30T14:58:17Z

Fixing the clang-related issues is not a problem; it's just that we didn't want to diverge from the upstream, which is what prompted us to attempt to resync with them.

kiskra-nvidia · 2024-05-01T17:14:01Z

FYI, in the next NCCL version we will be using the NVTX headers based on https://github.com/NVIDIA/NVTX/tree/dev.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provenance of NVTX headers in NCCL #1270

Provenance of NVTX headers in NCCL #1270

Artem-B commented Apr 29, 2024

kiskra-nvidia commented Apr 29, 2024

Artem-B commented Apr 29, 2024

kiskra-nvidia commented Apr 30, 2024 •

edited

kiskra-nvidia commented May 1, 2024 •

edited

Provenance of NVTX headers in NCCL #1270

Provenance of NVTX headers in NCCL #1270

Comments

Artem-B commented Apr 29, 2024

kiskra-nvidia commented Apr 29, 2024

Artem-B commented Apr 29, 2024

kiskra-nvidia commented Apr 30, 2024 • edited

kiskra-nvidia commented May 1, 2024 • edited

kiskra-nvidia commented Apr 30, 2024 •

edited

kiskra-nvidia commented May 1, 2024 •

edited