Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error compiling cp2k "cuInit failed with error" #3369

Open
gcpmendez opened this issue Apr 25, 2024 · 1 comment
Open

Error compiling cp2k "cuInit failed with error" #3369

gcpmendez opened this issue Apr 25, 2024 · 1 comment

Comments

@gcpmendez
Copy link

I am getting the following error after compilation:

$ ./cp2k.psmp --help
ERROR: cuInit failed with error:  34 /share/easybuild/software/sources/c/CP2K/test_25/cp2k-2024.1/src/offload/offload_library.c 57

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x146454c5db1f in ???
#1  0x146454c5da9f in ???
#2  0x146454c30e04 in ???
#3  0x373d41b in offload_init
	at /share/easybuild/software/sources/c/CP2K/test_25/cp2k-2024.1/src/offload/offload_library.c:58
#4  0xf88643 in __f77_interface_MOD_init_cp2k
	at /share/easybuild/software/sources/c/CP2K/test_25/cp2k-2024.1/src/f77_interface.F:234
#5  0x8ae241 in cp2k
	at /share/easybuild/software/sources/c/CP2K/test_25/cp2k-2024.1/src/start/cp2k.F:284
#6  0x4befbc in main
	at /share/easybuild/software/sources/c/CP2K/test_25/cp2k-2024.1/src/start/cp2k.F:44
Aborted (core dumped)

I have followed the following steps:

# Modules
$ ml load gompi/2022b foss/2022b libxsmm/1.17-GCC-12.2.0 libvori/220621-GCCcore-12.2.0 PLUMED/2.9.0-foss-2022b CUDA/12.0.0 UCX-CUDA/1.13.1-GCCcore-12.2.0-CUDA-12.0.0 HDF5/1.14.0-gompi-2022b Libint/2.7.2-GCC-12.2.0-lmax-6-cp2k ELPA/2023.05.001-foss-2022b-CUDA-12.0.0 CMake/3.24.3-GCCcore-12.2.0 libxc/6.1.0-GCC-12.2.0

# Extract
$ tar xvf cp2k-2024.1.tar.bz2
$ cd cp2k-2024.1

# Compiling
$ cd /share/easybuild/software/sources/c/CP2K/cp2k-2024.1/tools/toolchain
$ export http_proxy=http://10.0.29.4:3128; export https_proxy=http://10.0.29.4:3128
$ ./install_cp2k_toolchain.sh --with-libxsmm=system --with-openblas=install  --with-fftw=system  --with-elpa=system --enable-cuda --gpu-ver=A100
$ cd ../..
$ cp /share/easybuild/software/sources/c/CP2K/cp2k-2024.1/tools/toolchain/install/arch/* arch/
$ source /share/easybuild/software/sources/c/CP2K/cp2k-2024.1/tools/toolchain/install/setup

# Editing arch/local_cuda.psmp file
+ NVCC    = /share/easybuild/software/x86_64/software/CUDA/12.4.0/bin/nvcc
+ DFLAGS += -D__ACC -D__DBCSR_ACC -D__PW_CUDA
+ LIBS   += -lcudart -lcublas -lcufft -lnvrtc

$ make -j 64 ARCH=local_cuda VERSION=psmp
# No errors

Driver and CUDA version:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0
$ nvidia-smi
Thu Apr 25 12:10:52 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:17:00.0 Off |                    0 |
| N/A   27C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:31:00.0 Off |                    0 |
| N/A   26C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-PCI...  On   | 00000000:B1:00.0 Off |                    0 |
| N/A   26C    P0    34W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-PCI...  On   | 00000000:CA:00.0 Off |                    0 |
| N/A   26C    P0    30W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any recommendations on how to get out of this situation?
Thanks in advance.

@oschuett
Copy link
Member

It seem cuInit can not find any available GPUs. This might be because we recently started initializing CUDA before MPI (#3121). Maybe the right order depends on the vendor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants