Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU configuration: The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck #1199

Open
trapprb8 opened this issue Feb 6, 2024 · 10 comments

Comments

@trapprb8
Copy link

trapprb8 commented Feb 6, 2024

When I start a simulation in gpu mode I get the following error message:

Error in setConst_hprime_xx: invalid device symbol
The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck

I am trying to configure with a Quadro P4000, which should be Pascal architecture, and therefore cuda8 should be used in configuration I guess (according the overview in the makefile, see below)?

I used the following code:

$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make

Overview in makefile:

# CUDA architecture / code version
# Fermi   (not supported): -gencode=arch=compute_10,code=sm_10
# Tesla   (Tesla C2050, GeForce GTX 480): -gencode=arch=compute_20,code=sm_20
# Tesla   (cuda4, K10, Geforce GTX 650, GT 650m): -gencode=arch=compute_30,code=sm_30
# Kepler  (cuda5, K20) : -gencode=arch=compute_35,code=sm_35
# Kepler  (cuda6.5, K80): -gencode=arch=compute_37,code=sm_37
# Maxwell (cuda6.5+/cuda7, Quadro K2200): -gencode=arch=compute_50,code=sm_50
# Pascal  (cuda8,P100, GeForce GTX 1080, Titan): -gencode=arch=compute_60,code=sm_60
# Volta   (cuda9, V100): -gencode=arch=compute_70,code=sm_70
# Turing  (cuda10, T4, GeForce RTX 2080): -gencode=arch=compute_75,code=sm_75
# Ampere  (cuda11, A100, GeForce RTX 3080): -gencode=arch=compute_80,code=sm_80
# Hopper  (cuda12, H100): -gencode=arch=compute_90,code=sm_90
@danielpeter
Copy link
Contributor

the Quadro P4000 has CUDA compute capability 6.1. that means you will likely have to modify the Makefile a bit after configuration and instead of

-gencode=arch=compute_60,code=sm_60

use:

-gencode=arch=compute_61,code=sm_61

@trapprb8
Copy link
Author

trapprb8 commented Feb 7, 2024

Thank you for your answer! :)
Unfortunately, that didn't work yet, the error stays the same.
What I did now was:

in Makefile.in:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
in Makefile:
GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"
GENCODE = $(GENCODE_60) $(FC_DEFINE)GPU_DEVICE_Pascal #this line stays same, just wanted to show for completion
and run
$ ./configure FC=gfortran CC=gcc --with-mpi MPIFC=mpif90 USE_BUNDLED_SCOTCH=1 --with-cuda=cuda8 CUDA_LIB=/usr/local/cuda/lib64
$ make

@danielpeter
Copy link
Contributor

great, thanks for the quick feedback!

note that the Makefile gets created by running the ./configure script. so, you would only need to either modify the Makefile.in before running the configuration, of the Makefile after running the configuration.

@trapprb8
Copy link
Author

trapprb8 commented Feb 7, 2024

Hi Daniel, thanks again! :)
I also did this, however it does not work. Still the same error.
We are only talking about the Makefile.in and Makefile in the main directory, right?
I uploaded the two files:

Makefile.txt
Makefile.in.txt

@danielpeter
Copy link
Contributor

yes, the GPU architecture is specified only in the main Makefiles in the root directory, Makefile.in and the generated one Makefile.

can you be more specific what did not work, the compilation even with the modifications as you suggested, or the modification of only one of the Makefiles? that is, do you still get the error

Error in setConst_hprime_xx: invalid device symbol

even with the modification

GENCODE_60 = -gencode=arch=compute_61,code=\"sm_61,compute_61\"

in these Makefiles? if so, then what are your CUDA toolkit and CUDA driver versions?

@trapprb8
Copy link
Author

trapprb8 commented Feb 7, 2024

Exactly, the error is the same as before:
Error in setConst_hprime_xx: invalid device symbol The problem is maybe -arch sm_13 instead of -arch sm_11 in the Makefile, please doublecheck
The Cuda version is 11.8, nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

@danielpeter
Copy link
Contributor

could you also add the output of the command nvidia-smi to see the driver version on your system?

@trapprb8
Copy link
Author

trapprb8 commented Feb 8, 2024

This output is:


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P4000        On   | 00000000:05:00.0  On |                  N/A |
| 46%   30C    P0    28W / 105W |    240MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2980      G   /usr/lib/xorg/Xorg                192MiB |
|    0   N/A  N/A      3511      G   cinnamon                           30MiB |
|    0   N/A  N/A      4673      G   /usr/lib/firefox/firefox           13MiB |
+-----------------------------------------------------------------------------+

@danielpeter
Copy link
Contributor

tricky... according to the toolkit documentation, that driver version looks okay for CUDA 11.8 and it should support the compute capability 6.1. unfortunately, I can't reproduce it as I don't have access to such a GPU card. the code works however on most older and newer cards, so I would expect this to be a driver version and CUDA toolkit issue.

to double check the compute capability of your card, could you compile and run the little helper tool in utils/GPU_tools/ folder on your system:

cd ~/<specfem-directory>/utils/GPU_tools/
nvcc --gpu-architecture=sm_60 -o check_cuda_device check_cuda_device.cu
./check_cuda_device

the tool will provide an info output with the compute capability listed.

in the past CIG-seismo forum somebody was able to run the code on a Quadro P6000, I think with a CUDA 9.1 version. you could try to downgrade CUDA driver & runtime version to see if this solves the issue.

@trapprb8
Copy link
Author

trapprb8 commented Feb 9, 2024

Hi dear,

here is the output of the helper tool:

``
found number of CUDA devices = 1

GPU device id: 0

Device Name = Quadro P4000

memory:
totalGlobalMem (in MB, dividing by powers of 1024): 8116.562500
totalGlobalMem (in GB, dividing by powers of 1024): 7.926331

totalGlobalMem (in MB, dividing by powers of 1000): 8510.833008
totalGlobalMem (in GB, dividing by powers of 1000): 8.510833

sharedMemPerBlock (in bytes): 49152

blocks:
Maximum number of registers per block: 65536
Maximum number of threads per block: 1024
Maximum size of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535

features:
Compute capability of the device = 6.1
multiProcessorCount: 14
canMapHostMemory: TRUE
deviceOverlap: TRUE

0: GPU memory usage (dividing by powers of 1024): used = 319.625000 MB, free = 7796.937500 MB, total = 8116.562500 MB

0: GPU memory usage (dividing by powers of 1000): used = 335.151104 MB, free = 8175.681536 MB, total = 8510.832640 MB

number of total devices: 1
``

Ok.. Maybe I will try to downgrade the Cuda Toolkit then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants