Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in tensor core programming #251

Open
blueWatermelonFri opened this issue Jan 31, 2024 · 1 comment
Open

Bug in tensor core programming #251

blueWatermelonFri opened this issue Jan 31, 2024 · 1 comment

Comments

@blueWatermelonFri
Copy link

I encountered a strange bug while programming tensor core using the WMMA api in A800.
I tried to print the size of the element in the fragment,Normally sizeof(fp16) is 2, the following code also outputs 2.

    wmma::load_matrix_sync(a_frag, a , lda);
    wmma::load_matrix_sync(b_frag, b , ldb);
    wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);

    printf("%d\n", sizeof(a_frag.x[0]));
    printf("%d\n", sizeof(a_frag.x[1]));

However, I changed the code to the following form, the print statement output 2 and 0, even i changed the order of a_frag.x[0] and a_frag.x[1], the output still is 2 and 0, Does anyone know why?

    wmma::load_matrix_sync(a_frag, a , lda);
    wmma::load_matrix_sync(b_frag, b , ldb);
    wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);

    printf("%d %d\n", sizeof(a_frag.x[0]), sizeof(a_frag.x[1]));

My code runs in the following environment, os is ubuntu 20.04.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A800 80GB PCIe          Off | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0              68W / 300W |   2768MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

My compile command is:

nvcc tensor_core.cu -std=c++11 -lcublas -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -res-usage -lcudart  -lineinfo -Xcompiler -fopenmp
@yuantailing
Copy link

yuantailing commented May 30, 2024

Hello @blueWatermelonFri,

The format specifier %d expects an argument of type int, not size_t. The correct version of the code is

printf("%d %d\n", (int)sizeof(a_frag.x[0]), (int)sizeof(a_frag.x[1]));

I guess the reason for the output of 2 and 0 may be that size_t takes 8 bytes, i.e., 0x00000002, so 0x0002 takes the place of the first %d, and 0x0000 takes the place of the second %d.

Additionally, you may have received a warning like:

warning #181-D: argument is incompatible with corresponding format string conversion (expected type "int" but argument has type "unsigned long long")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants