Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: "NCCL needs all GPUs of a host to be part of a collective in other to reliably use NVLinks"? #1248

Open
mkarrmann opened this issue Apr 8, 2024 · 0 comments

Comments

@mkarrmann
Copy link

mkarrmann commented Apr 8, 2024

This comment says "NCCL needs all GPUs of a host to be part of a collective in other to reliably use NVLinks".

Firstly, I'm not sure if "collective" is being used in an informal sense, if they mean "communicator", or something else. Regardless, if there's any truth to this, I'd like to understand this better. After a fair bit of searching, I haven't been able to find much else suggesting anything along these lines.

Could I please get some clarity on whether or not there is any truth to this? Even if the statement isn't entirely true, would it be most efficient to make all of the GPUs of a node part of a single Communicator as opposed to splitting them into multiple Communicators?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant