Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nep executable error - "no kernel image is available for execution on the device" #576

Open
antoni-2 opened this issue Mar 26, 2024 · 9 comments

Comments

@antoni-2
Copy link

antoni-2 commented Mar 26, 2024

Hello,

I would like to report an issue I found using GPUMD version 3.9.1.

I was trying to create the first test neuroevolution potential using the “nep” executable on the cluster I use. After preparing the input files (nep.in, test.xyz, and train.xyz) and running the "nep" command, GPUMD gives the information:

number of GPUs = 1

Device id: 0

Device name:             Tesla K80

Compute capability:      3.7

Amount of global memory: 11.1731 GB

Number of SMs:           13

Then the nep.in file is read successfully. Later:


Started reading train.xyz.


Number of configurations = 20.

Number of devices = 1

Number of batches = 1

Hello, I changed the batch_size from 1000 to 20.

Batch 0:

Number of configurations = 20.


Constructing train_set in device 0.

Total number of atoms = 1000.

Number of atoms in the largest configuration = 50.

Number of configurations having virial = 0.

CUDA Error:

File:       main_nep/dataset.cu

Line:       266

Error code: 209

Error text: no kernel image is available for execution on the device

With the help of the cluster admins, we checked that the error is caused by the command “CUDA_CHECK_KERNEL”, defined in the utilities/error.cuh as:

#define CUDA_CHECK_KERNEL                                                                          \

  {                                                                                                \

    CHECK(cudaGetLastError());                                                                     \

    CHECK(cudaDeviceSynchronize());                                                                \

  }

#else

#define CUDA_CHECK_KERNEL                                                                          \

  {                                                                                                \

    CHECK(cudaGetLastError());                                                                     \

  }

#endif

The function we think is causing the error is cudaDevicesSynchronize(). However, this command seems to work when we run it outside GPUMD.

Configuration of the Cluster: driver version: 470.129.06, CUDA Version: 11.4., GPU card: Tesla K80. The nvcc compilation with NVHPC 23.3 and CUDA 11.8. gave the same effect.

I do not know how to solve this issue. I would be very grateful for your help!

Kind regards,
Antoni

@brucefan1983
Copy link
Owner

brucefan1983 commented Mar 26, 2024

You can try to change CFLAGS = -std=c++14 -O3 -arch=sm_60 to CFLAGS = -std=c++14 -O3 -arch=sm_37 in src/makefile and try again (make clean and then make).

@antoni-2
Copy link
Author

antoni-2 commented Mar 28, 2024

Thank you for your answer. Unfortunately, the error persists. Below I am sending the makefile that was used during the compilation.
Different combinations of CFLAGS were tried:

CFLAGS = -std=c++11 -O3 -arch=sm_37
CFLAGS = -std=c++14 -O3 -arch=sm_37

… as well as compilation with and without PLUMED and NetCDF, giving the same effect.

Moreover, the error message remains the same while using the input files (nep.in, train.xyz, and test.xyz) from the repository (GPUMD/examples/11_NEP_potential_PbTe/).

makefile_gpumd_issue_28mar2024.txt

@brucefan1983
Copy link
Owner

Then I guess CUDA code does not work in your platform at all. You can try to compile and run the folloiwng simplest CUDA code:

#include <stdio.h>
__global__ void hello_from_gpu()
{
    printf("Hello World from the GPU!\n");
}

int main(void)
{
    hello_from_gpu<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

Save the above code into file hello.cu and compile it using nvcc -arch=sm_37 hello.cu -o hello and then run the executable ./hello. If it is successful, you will see the message Hello World from the GPU!.

@antoni-2
Copy link
Author

antoni-2 commented Apr 2, 2024

I am sending the output from the commands after creating the hello.cu file:
nvcc -arch=sm_37 hello.cu -o hello

nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

./hello

Hello World from the GPU!

It seems that the cudaDeviceSynchronize() command works correctly outside GPUMD on the cluster I use. Unfortunately, I do not know what is the origin of such behaviour (that the command works outside GPUMD, and within it does not). Do you have an idea?

@brucefan1983
Copy link
Owner

Being able to compile and run the simplest CUDA code means you have a working CUDA platform.

Then did you run gpumd (or nep) from comand line directly? The error

Error text: no kernel image is available for execution on the device

means that your executable was not compiled to target your GPU architecture. However, you showed that you have used -arch=sm_37 to compile, which macthes the K80 GPU you mentioned. So I am really puzzled.

@antoni-2
Copy link
Author

antoni-2 commented Apr 3, 2024

The error log I reported at the beginning of this issue, was shown after running the "nep" command directly from the command line. This was done in the directory with the input files (nep.in, train.xyz, and test.xyz). I did not use "gpumd" command yet.

@brucefan1983
Copy link
Owner

If possible, could you change a platform to test?

@tamaswells
Copy link
Collaborator

I encountered a similar problem before. Just changed -arch=sm_XX to a smaller number and the problem was solved.

@antoni-2
Copy link
Author

Thanks for the tip. Unfortunately, in my case, the compilation with a lower number in -arch=sm_XX resulted in the same effect. The tested options were:
-arch=sm_35 (lowest possible value for the compilation with each of two different nvcc versions, that are available on the cluster I use)
-arch=native (this is probably equal to sm_37. This was also tried with different versions of nvcc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants