You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This comment says "NCCL needs all GPUs of a host to be part of a collective in other to reliably use NVLinks".
Firstly, I'm not sure if "collective" is being used in an informal sense, if they mean "communicator", or something else. Regardless, if there's any truth to this, I'd like to understand this better. After a fair bit of searching, I haven't been able to find much else suggesting anything along these lines.
Could I please get some clarity on whether or not there is any truth to this? Even if the statement isn't entirely true, would it be most efficient to make all of the GPUs of a node part of a single Communicator as opposed to splitting them into multiple Communicators?
Thank you!
The text was updated successfully, but these errors were encountered:
This comment says "NCCL needs all GPUs of a host to be part of a collective in other to reliably use NVLinks".
Firstly, I'm not sure if "collective" is being used in an informal sense, if they mean "communicator", or something else. Regardless, if there's any truth to this, I'd like to understand this better. After a fair bit of searching, I haven't been able to find much else suggesting anything along these lines.
Could I please get some clarity on whether or not there is any truth to this? Even if the statement isn't entirely true, would it be most efficient to make all of the GPUs of a node part of a single Communicator as opposed to splitting them into multiple Communicators?
Thank you!
The text was updated successfully, but these errors were encountered: