Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA 11 error (invalid resource handle) after destroying FFT plan & using a new one #308

Open
vincefn opened this issue Dec 4, 2020 · 3 comments

Comments

@vincefn
Copy link

vincefn commented Dec 4, 2020

Problem

I have found an issue when using CUDA 11.1, where creating a FFT plan, using it and doing another operation (simple sum reduction), then deleting the plan, re-creating another one and doing this again ends up with a cuFuncSetBlockShape failed: invalid resource handle

The following minimal example can be used to reproduce the issue (needs to be done in a fresh session for reproductibility)

import numpy as np
import pycuda.gpuarray as cua
import pycuda.autoinit
import skcuda.fft as cu_fft

fft_shape = (128, 128)

plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
a = cua.to_gpu(np.random.uniform(0,1, fft_shape).astype(np.complex64))
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)

del plan

plan = cu_fft.Plan(fft_shape, np.complex64, np.complex64, batch=1)
cu_fft.fft(a, a, plan)
tmp = cua.sum(a)

Using the above code in a fresh python session always ends up with the following error:

---> 17 tmp = cua.sum(a)

~/dev/py38-env/lib/python3.8/site-packages/pycuda/gpuarray.py in sum(a, dtype, stream, allocator)
   1639     from pycuda.reduction import get_sum_kernel
   1640     krnl = get_sum_kernel(dtype, a.dtype)
-> 1641     return krnl(a, stream=stream, allocator=allocator)
   1642
   1643
~/dev/py38-env/lib/python3.8/site-packages/pycuda/reduction.py in __call__(self, *args, **kwargs)
    283
    284             # print block_count, seq_count, self.block_size, sz
--> 285             f((block_count, 1), (self.block_size, 1, 1), stream,
    286                     *([result.gpudata]+invocation_args+[seq_count, sz]),
    287                     **kwargs)

~/dev/py38-env/lib/python3.8/site-packages/pycuda/driver.py in function_prepared_async_call(func, grid, block, stream, *arg
s, **kwargs)
    547     def function_prepared_async_call(func, grid, block, stream, *args, **kwargs):
    548         if isinstance(block, tuple):
--> 549             func._set_block_shape(*block)    550         else:
    551             from warnings import warn

LogicError: cuFuncSetBlockShape failed: invalid resource handle

The error occurs during the pycuda sum reduction, but it seems triggered by the deletion of the plan and re-creation of another one, so it may be due to cuFFT.
I noted than in CUDA 11.1 the release notes indicate: "After successfully creating a plan, cuFFT now enforces a lock on the cufftHandle. Subsequent calls to any planning function with the same cufftHandle will fail" but I have no idea if that can be related.

Environment

List the following info:

  • OS platform: Linux (tested in power64/debian10, but also fresh X86_64 cloud machines (from vast.ai) based on https://hub.docker.com/r/nvidia/cuda/ , for example version nvidia/cuda:11.1-devel or nvidia/cuda:11.0-devel images)
  • Python version: 3.8 (probably not dependent)
  • CUDA version: 11.0 (with driver 455.45.01) , 11.1 (with driver 450.80.02, 455.23.05 or 455.38)
  • PyCUDA version: pycuda.VERSION = (2020, 1)
  • scikit-cuda version: latest git 806ee27 (0.53 pip-installed also has the issue)
@vincefn
Copy link
Author

vincefn commented Jan 3, 2021

I tested this also under windows 10 with CUDA 11.2 and the issue is reproduced with the above code snippet

In the CUDA 11.2 release notes you can read among known issues: "cuFFT planning and plan estimation functions may not restore correct context affecting CUDA driver API applications"

@vincefn
Copy link
Author

vincefn commented Jan 5, 2022

Under linux with cuda toolkit 11.5 installed in a conda environment (cufftGetVersion() reports 106000 ; driver 460.91.03), the issue is still present, even if the cuda release notes do not mention the issue any more (?)...

@dimitsev
Copy link

Related? #330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants