You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a strange bug while programming tensor core using the WMMA api in A800.
I tried to print the size of the element in the fragment,Normally sizeof(fp16) is 2, the following code also outputs 2.
wmma::load_matrix_sync(a_frag, a , lda);
wmma::load_matrix_sync(b_frag, b , ldb);
wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
printf("%d\n", sizeof(a_frag.x[0]));
printf("%d\n", sizeof(a_frag.x[1]));
However, I changed the code to the following form, the print statement output 2 and 0, even i changed the order of a_frag.x[0] and a_frag.x[1], the output still is 2 and 0, Does anyone know why?
wmma::load_matrix_sync(a_frag, a , lda);
wmma::load_matrix_sync(b_frag, b , ldb);
wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
printf("%d %d\n", sizeof(a_frag.x[0]), sizeof(a_frag.x[1]));
My code runs in the following environment, os is ubuntu 20.04.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A800 80GB PCIe Off | 00000000:03:00.0 Off | 0 |
| N/A 43C P0 68W / 300W | 2768MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
I guess the reason for the output of 2 and 0 may be that size_t takes 8 bytes, i.e., 0x00000002, so 0x0002 takes the place of the first %d, and 0x0000 takes the place of the second %d.
Additionally, you may have received a warning like:
warning #181-D: argument is incompatible with corresponding format string conversion (expected type "int" but argument has type "unsigned long long")
I encountered a strange bug while programming tensor core using the WMMA api in A800.
I tried to print the size of the element in the fragment,Normally sizeof(fp16) is 2, the following code also outputs 2.
However, I changed the code to the following form, the print statement output 2 and 0, even i changed the order of
a_frag.x[0]
anda_frag.x[1]
, the output still is 2 and 0, Does anyone know why?My code runs in the following environment, os is ubuntu 20.04.
My compile command is:
The text was updated successfully, but these errors were encountered: