NVIDIA / nccl Public

Notifications
Fork 735
Star 2.9k

Code
Issues 497
Pull requests 54
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: NVIDIA/nccl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

497 Open 618 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

all-reduce slower on v2.20.5 compared to v2.18.5 on AWS g5.48xlarge (8 x A10G)

#1298 opened May 25, 2024 by abdulfatir

Internal error when submitting a job to a Ray cluster

#1297 opened May 24, 2024 by troelsfr

What's the relationship between nccl protcols and inter-node communication?

#1296 opened May 24, 2024 by Alex-Wong

NCCL_NET_GDR_READ's performance impact on a PCIe platform

#1295 opened May 23, 2024 by cold2stone

Does ncclBroadcast call return at same time on different ranks？

#1294 opened May 20, 2024 by Eiji911

Failed to find ncclNetPlugin_v8 symbol

#1292 opened May 18, 2024 by wwj-2017-1117

nccl-tests with two GH200 over Quantum2 iB stuck

#1291 opened May 17, 2024 by itzsimpl

Inquiry about NCCL's Tree Algorithm Performance in Single and Dual Machine Scenarios

#1290 opened May 17, 2024 by fizzlover

NCCL stuck when using nccl-test.

#1289 opened May 17, 2024 by deepzzz123

One of the NODE will hang when NCCL_NET_GDR_READ=1

#1288 opened May 16, 2024 by shanleo1986

How can this be ported to Windows?

#1287 opened May 15, 2024 by eabase

How can I identify level1 nvswitch and level2 nvswitch in NCCL

#1286 opened May 14, 2024 by Ryan201802

AMD EPYC 7K62 NCCL-test 4090 bandwidth too

#1285 opened May 13, 2024 by ghoul02015

RuntimeError: NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details) when torch._C._broadcast_coalesced

#1283 opened May 11, 2024 by zhoulei-biubiu

Why nccl ring all reduce stream duration doesn't scales with theoretical (N-1)/N?

#1282 opened May 11, 2024 by CraneQinghe

Why is allgather's busbw a little worse than allreduce/reducescatter for the same nccl environment variables

#1281 opened May 10, 2024 by pkuleo

Seeking for some explanations on the meaning of terminology in nvtx.h

#1280 opened May 8, 2024 by ZhiyiHu1999

HGX 2-node test with different NIC topologies different network card names hangs, no results

#1277 opened May 8, 2024 by superLiben

Nccl build error

#1276 opened May 8, 2024 by sandeep06011991

[BUG] NCCL2.20.5 meets "Message truncated : received 1024 bytes instead of 256" error while 2.18.5 not

#1273 opened Apr 30, 2024 by shh2000

Add toggle to disable logging of version string

#1271 opened Apr 30, 2024 by c-oak

Provenance of NVTX headers in NCCL

#1270 opened Apr 29, 2024 by Artem-B

NICs on same subnet

#1269 opened Apr 28, 2024 by samsamoa

Program stuck when destroying NCCL

#1266 opened Apr 25, 2024 by sleepwalker2017

Only ~783GByte/s out of theoretical 900GB/s HGX H100 SXM Nvlink4

#1264 opened Apr 24, 2024 by OrenLeung

Previous 1 2 3 4 5 … 19 20 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly