Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance of NVTX headers in NCCL #1270

Open
Artem-B opened this issue Apr 29, 2024 · 4 comments
Open

Provenance of NVTX headers in NCCL #1270

Artem-B opened this issue Apr 29, 2024 · 4 comments

Comments

@Artem-B
Copy link

Artem-B commented Apr 29, 2024

NCCL appears to carry some sort of fork of nvtx3 headers that match neither the headers in https://github.com/NVIDIA/nvtx nor the nvtx3 headers distributed with CUDA-12.x. Unfortunately, NCCL commit history does not mention the provenance of these headers, so it's not clear if they are a snapshot of some WIP branch in NVIDIA's internal repo, or if it's something developed by/for NCCL.

My issue is that when I build an application with CUDA-12, I generally need to compile everything with one set of those headers. Debugging ODR violations caused by different parts of the build including different variants of the headers is not fun.

Would it be possible to switch NCCL to use pristine upstream NVTX headers, and either keep NCCL-specific extensions only, or upstream the changes if they are not NCCL-specific?

@jrhemstad, @sjeaugey

@kiskra-nvidia
Copy link
Member

We are using a snapshot of the headers from an internal repo. Funny you should ask though -- I've been pushing on the parties involved to get them in shape for the last few weeks, as they've been causing compilation warnings, especially with clang. Let me ping them again...

So I'm hoping that the next NCCL release will have an updated version, but it won't be the same as what's in the current CUDA release.

@Artem-B
Copy link
Author

Artem-B commented Apr 29, 2024

I can help with clang-related issues. If the warnings are related to C++, clang is usually more strict than nvcc, and usually has a good reason to complain.

So I'm hoping that the next NCCL release will have an updated version, but it won't be the same as what's in the current CUDA release.

It does not have to be the same, as long as they can be used instead of the headers in CUDA-12. It would work for us short-term, as long as we have a variant that works for everyone.

Long term I would expect to see problems if/when NCCL's copy and NVTX repo and/or the version shipped with CUDA start to diverge. Shipping a fork as part of another library is a potential timebomb.

@kiskra-nvidia
Copy link
Member

kiskra-nvidia commented Apr 30, 2024

Fixing the clang-related issues is not a problem; it's just that we didn't want to diverge from the upstream, which is what prompted us to attempt to resync with them.

@kiskra-nvidia
Copy link
Member

kiskra-nvidia commented May 1, 2024

FYI, in the next NCCL version we will be using the NVTX headers based on https://github.com/NVIDIA/NVTX/tree/dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants