You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I learned that using non-blocking cudaStreams can achieve kernel concurrency. I want to know if it can achieve kernel parallelism.
Through Nsys analysis, I found that even if the collective communication operations using different non-block cudaStreams are not run in parallel in the kernel.
So, can two AllReduce operations run in parallel using different non-blocking cudaStreams?
I've read the similar issues like #195#217#315 ,but I'm still don't know the answer.
Here is part of my code. cudaStream_t stream1, stream2; CUDACHECK(cudaStreamCreateWithFlags(stream1, cudaStreamNonBlocking)); CUDACHECK(cudaStreamCreateWithFlags(stream2, cudaStreamNonBlocking)); ncclAllReduce((const void*)buffer1, (void*)buffer1, size, ncclUint64, ncclSum, comm, stream1); ncclAllReduce((const void*)buffer2, (void*)buffer2, size, ncclUint64, ncclSum, comm, stream2);
The content you are editing has changed. Please copy your edits and refresh the page.
I learned that using non-blocking cudaStreams can achieve kernel concurrency. I want to know if it can achieve kernel parallelism.
Through Nsys analysis, I found that even if the collective communication operations using different non-block cudaStreams are not run in parallel in the kernel.
So, can two AllReduce operations run in parallel using different non-blocking cudaStreams?
I've read the similar issues like #195 #217 #315 ,but I'm still don't know the answer.
Here is part of my code.
cudaStream_t stream1, stream2;
CUDACHECK(cudaStreamCreateWithFlags(stream1, cudaStreamNonBlocking));
CUDACHECK(cudaStreamCreateWithFlags(stream2, cudaStreamNonBlocking));
ncclAllReduce((const void*)buffer1, (void*)buffer1, size, ncclUint64, ncclSum, comm, stream1);
ncclAllReduce((const void*)buffer2, (void*)buffer2, size, ncclUint64, ncclSum, comm, stream2);
Tasks
The text was updated successfully, but these errors were encountered: