Maybe there is an error in Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu? #240

xxyux · 2023-12-01T14:04:29Z

As you can see, it shoule be reinterpret_cast<float4 *>(&B[b + wB * threadIdx.y + t4x]); obviously in line721.
However, it is reinterpret_cast<float4 *>(&B[a + wA * threadIdx.y + t4x]);
https://github.com/NVIDIA/cuda-samples/blob/e8568c417356f7e66bb9b7130d6be7e55324a519/Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu#L225C1-L225C1

When matrixA.width not equal to matrixB.width, the code will be wrong.

Actually, I don't know what happend.
I just a beginner, and want to know more about cuda::memcpy_async.
Who can answer my question, it really confused me!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe there is an error in Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu? #240

Maybe there is an error in Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu? #240

xxyux commented Dec 1, 2023 •

edited

Maybe there is an error in Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu? #240

Maybe there is an error in Samples/3_CUDA_Features/globalToShmemAsyncCopy/globalToShmemAsyncCopy.cu? #240

Comments

xxyux commented Dec 1, 2023 • edited

xxyux commented Dec 1, 2023 •

edited