You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When matrixA.width not equal to matrixB.width, the code will be wrong.
Actually, I don't know what happend.
I just a beginner, and want to know more about cuda::memcpy_async.
Who can answer my question, it really confused me!
The text was updated successfully, but these errors were encountered:
As you can see, it shoule be
reinterpret_cast<float4 *>(&B[b + wB * threadIdx.y + t4x]);
obviously in line721.However, it is
reinterpret_cast<float4 *>(&B[a + wA * threadIdx.y + t4x]);
https://github.com/NVIDIA/cuda-samples/blob/e8568c417356f7e66bb9b7130d6be7e55324a519/Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu#L225C1-L225C1
When matrixA.width not equal to matrixB.width, the code will be wrong.
Actually, I don't know what happend.
I just a beginner, and want to know more about
cuda::memcpy_async
.Who can answer my question, it really confused me!
The text was updated successfully, but these errors were encountered: