Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Running on windows 10] cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87 #17108

Closed
juanwulu opened this issue Feb 14, 2019 · 75 comments
Labels
module: windows Windows support for PyTorch needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@juanwulu
Copy link

❓ Questions and Help

Please note that this issue tracker is not a help form and this issue will be closed.

We have a set of listed resources available on the website. Our primary means of support is our discussion forum:

While trying to run my test.py file on my anaconda prompt I got these messages below:

CUDA™ is AVAILABLE
Please assign a gpu core (int, <1): 0
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error
Traceback (most recent call last):
File "VSLcore.py", line 202, in
DQNAgent()
File "VSLcore.py", line 87, in DQNAgent
torch.set_default_tensor_type('torch.cuda.FloatTensor')
File "D:\Softwares\Anaconda3\lib\site-packages\torch_init_.py", line 158, in set_default_tensor_type
_C.set_default_tensor_type(t)
File "D:\Softwares\Anaconda3\lib\site-packages\torch\cuda_init
.py", line 162, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

What should I do?

@pytorchbot pytorchbot added the module: windows Windows support for PyTorch label Feb 14, 2019
@juanwulu
Copy link
Author

And also my CUDA version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130

@peterjc123
Copy link
Collaborator

I think that this statement torch.set_default_tensor_type('torch.cuda.FloatTensor') should be replaced by torch.set_default_tensor_type(torch.cuda.FloatTensor).

@gmseabra
Copy link

I am having the same issue here. My system:

  • Windows 10
  • NVIDIA GeForce GTX 1060
  • Python 3.7.1 (Anaconda)
  • PyTorch 1.0.1
  • CUDA 10

And here is a sample code that reproduces the error:

>ipython
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

In [3]: torch.cuda.device_count()
Out[3]: 1

In [4]: torch.cuda.current_device()
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-3380d2c12118> in <module>
----> 1 torch.cuda.current_device()

C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py in current_device()
    339 def current_device():
    340     r"""Returns the index of a currently selected device."""
--> 341     _lazy_init()
    342     return torch._C._cuda_getDevice()
    343

C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py in _lazy_init()
    160             "Cannot re-initialize CUDA in forked subprocess. " + msg)
    161     _check_driver()
--> 162     torch._C._cuda_init()
    163     _cudart = _load_cudart()
    164     _cudart.cudaGetErrorName.restype = ctypes.c_char_p

RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

In [5]:

Could this be a bug?

@peterjc123
Copy link
Collaborator

I don't think cuda error 30 is an error on our side. Please try these things first.

  1. Re-install latest GPU driver
  2. Reboot
  3. Ensure you have admin access

@gmseabra
Copy link

OK, I did some extra tests, and it seems that it is some weird behavior only when running on an interactive shell. Here's what I have done (step-by-step)

  1. Prepare a simple file with the example:
> type torch_test.ipy
import torch
print("torch.cuda.is_available()   =", torch.cuda.is_available())
print("torch.cuda.device_count()   =", torch.cuda.device_count())
print("torch.cuda.device('cuda')   =", torch.cuda.device('cuda'))
print("torch.cuda.current_device() =", torch.cuda.current_device())

I can run this file with either Python or iPython, and it all works fine:

> python torch_test.ipy
torch.cuda.is_available()   = True
torch.cuda.device_count()   = 1
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x0000021B331A0160>
torch.cuda.current_device() = 0

> ipython torch_test.ipy
torch.cuda.is_available()   = True
torch.cuda.device_count()   = 1
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x000002B39C1FD390>
torch.cuda.current_device() = 0

Now, if I try to use exactly the same commands in an interactive shell, I get the error:

With python:

>python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print("torch.cuda.is_available()   =", torch.cuda.is_available())
torch.cuda.is_available()   = True
>>> print("torch.cuda.device_count()   =", torch.cuda.device_count())
torch.cuda.device_count()   = 1
>>> print("torch.cuda.device('cuda')   =", torch.cuda.device('cuda'))
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x0000028CBD034198>
>>> print("torch.cuda.current_device() =", torch.cuda.current_device())
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 341, in current_device
    _lazy_init()
  File "C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 162, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87
>>> ^Z

or with ipython:

>ipython
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: print("torch.cuda.is_available()   =", torch.cuda.is_available())
torch.cuda.is_available()   = True

In [3]: print("torch.cuda.device_count()   =", torch.cuda.device_count())
torch.cuda.device_count()   = 1

In [4]: print("torch.cuda.device('cuda')   =", torch.cuda.device('cuda'))
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x0000018A068007F0>

In [5]: print("torch.cuda.current_device() =", torch.cuda.current_device())
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-f8c552eb6277> in <module>
----> 1 print("torch.cuda.current_device() =", torch.cuda.current_device())

C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py in current_device()
    339 def current_device():
    340     r"""Returns the index of a currently selected device."""
--> 341     _lazy_init()
    342     return torch._C._cuda_getDevice()
    343

C:\Anaconda3\lib\site-packages\torch\cuda\__init__.py in _lazy_init()
    160             "Cannot re-initialize CUDA in forked subprocess. " + msg)
    161     _check_driver()
--> 162     torch._C._cuda_init()
    163     _cudart = _load_cudart()
    164     _cudart.cudaGetErrorName.restype = ctypes.c_char_p

RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

In [6]:

Any hints?

@juanwulu
Copy link
Author

@gmseabra
Thanks for your post.
I tested it as your described and surprisingly I got the result which was a completely opposite of yours. It turns out it run well on interactive she'll but got bugged on the other

@gmseabra
Copy link

@ChocolateDave ,

@gmseabra
Thanks for your post.
I tested it as your described and surprisingly I got the result which was a completely opposite of yours. It turns out it run well on interactive she'll but got bugged on the other

How does it work in a Jupyter notebook?

@kuretru
Copy link

kuretru commented Feb 16, 2019

@gmseabra @ChocolateDave
I have the same problem with you. After a reboot, the problem was gone.

@gmseabra
Copy link

@gmseabra @ChocolateDave
I have the same problem with you. After a reboot, the problem was gone.

Can you tell us what is your configuration? Thanks!

@gmseabra
Copy link

gmseabra commented Feb 16, 2019

@ChocolateDave , @kuretru:
What are the versions of python, CUDA and PyTorch that you are using?

I am using:

  • Windows 10 v1809
  • Anaconda 3
  • Python 3.7.1
  • CUDA 10.0 (V10.0.130)
  • PyTorch 1.0.1 (py3.7_cuda100_cudnn7_1)
  • cudatoolkit 10.0.130

I have already tried rebooting, removing and reinstalling CUDA, torch, Anaconda, etc., and the error persists. There must be something else going on here...

@juanwulu
Copy link
Author

@gmseabra
Thanks for all your advice. Pardon me for replying so late due to my busy trip schedule. And without my laptop, I couldn't test my program on Jupyter.
If I remember it correctly, I'm currently using the same system configuration as yours.

@gmseabra
Copy link

@peterjc123

I don't think cuda error 30 is an error on our side. Please try these things first.

Re-install latest GPU driver
Reboot
Ensure you have admin access

I have tried all that, and the error is still there. DId you try and reproduce the error?

@kuretru
Copy link

kuretru commented Feb 17, 2019

@gmseabra
All environments are brand new, I reinstalled the OS on February 14th.
And I am using:

  • Nvidia GTX 860M
  • Windows 10 1809 x64
  • Python 3.7.2 x64
  • CUDA V10.0.130
  • PyTorch 1.0.1 (torch-1.0.1-cp37-cp37m-win_amd64.whl)
Python
>>> import torch
>>> torch.cuda.current_device()
>>> RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

After a reboot

Python
>>> import torch
>>> torch.cuda.current_device()
>>> 0

@gmseabra
Copy link

Thanks. I tried it all - reinstalled the whole Windows, then installed Visual Studio and CUDA Toolkit, installed Miniconda, installed PyTorch in a new environment, and still the same. The commands work from a file, but not interactively.

Note: I'm using Python 3.7.1. If I update the packages in miniconda, I fall into the error described here: #17233

@peterjc123
Copy link
Collaborator

peterjc123 commented Feb 19, 2019

I'm sorry but the issues are not reproducible at my side. Could you please try these things to help me locate the problem?

  1. Install the GPU driver that ships with the CUDA installation
  2. Install the wheels package instead of the conda package

Usually, the results should stay consistent regardless of the interactive mode is on or not So it's actually very weird. Maybe you should check whether they are using the exact same DLLs by using sth. like Process Explorer.

@gmseabra
Copy link

Install the GPU driver that ships with the CUDA installation

I'll try that

Usually, the results should stay consistent regardless of the interactive mode is on or not So it's actually very weird. Maybe you should check whether they are using the exact same DLLs by using sth. like Process Explorer.

OK, what should I look for here?

Thanks for looking into the issue!

@gmseabra
Copy link

Hi, I tried reverting to the CUDA drivers that come with the CUDA Development Kit, but I can't install them because I keep getting an error: "Windows cannot verify the driver signature... (Code 52)", so I have to stick with the most recent driver.

My system is an Acer laptop with:

  • Windows 10 Home Single Language v 1809
  • GeForce GTX 1060, Driver version 25.21.14.1891 (In the GeForce Experience it shows as 418.91)
  • Miniconda with Python 3.7.1

My exact procedure was:

  1. Install Miniconda. Do not update anything.
  2. Clone base into a new env: (base) > conda create --name torch --clone base
  3. Activate the new env: (base) > conda activate torch
  4. Install pytorch: (torch) > conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
  5. Deactivte / reactivate the env, just to be sure
  6. Try to run the simple example torch_test.py by: (torch) > python torch_test.py
  7. Try to run the same sequence of commands using the python interactive interpreter, see results below.

Here are the results I get. In the end I also add details about my environment and the output of the deviceQuery app from the CUDA tests:

Output of running the small program:

(torch) >python torch_test.py
torch.cuda.is_available()   = True
torch.cuda.device_count()   = 1
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x000001FCD3A61F28>
torch.cuda.current_device() = 0

Output of interactive python interpreter:

(torch) > python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.device('cuda')
<torch.cuda.device object at 0x000001E18C72D208>
>>> torch.cuda.current_device()
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=87 error=30 : unknown error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Miniconda3\envs\torch\lib\site-packages\torch\cuda\__init__.py", line 341, in current_device
    _lazy_init()
  File "C:\Miniconda3\envs\torch\lib\site-packages\torch\cuda\__init__.py", line 162, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87
>>>

Finally, here are the information about my conda environment:

(torch) >type torch_env.txt
# packages in environment at C:\Miniconda3\envs\torch:
#
# Name                    Version                   Build  Channel
asn1crypto                0.24.0                   py37_0
blas                      1.0                         mkl
ca-certificates           2018.03.07                    0
certifi                   2018.11.29               py37_0
cffi                      1.11.5           py37h74b6da3_1
chardet                   3.0.4                    py37_1
console_shortcut          0.1.1                         3
cryptography              2.4.2            py37h7a1dbc1_0
cudatoolkit               10.0.130                      0
freetype                  2.9.1                ha9979f8_1
icc_rt                    2019.0.0             h0cc432a_1
idna                      2.8                      py37_0
intel-openmp              2019.1                      144
jpeg                      9b                   hb83a4c4_2
libpng                    1.6.36               h2a8f88b_0
libtiff                   4.0.10               hb898794_2
menuinst                  1.4.14           py37hfa6e2cd_0
mkl                       2019.1                      144
mkl_fft                   1.0.10           py37h14836fe_0
mkl_random                1.0.2            py37h343c172_0
ninja                     1.8.2            py37he980bc4_1
numpy                     1.15.4           py37h19fb1c0_0
numpy-base                1.15.4           py37hc3f5095_0
olefile                   0.46                     py37_0
openssl                   1.1.1a               he774522_0
pillow                    5.4.1            py37hdc69c19_0
pip                       18.1                     py37_0
pycosat                   0.6.3            py37hfa6e2cd_0
pycparser                 2.19                     py37_0
pyopenssl                 18.0.0                   py37_0
pysocks                   1.6.8                    py37_0
python                    3.7.1                h8c8aaf0_6
pytorch                   1.0.1           py3.7_cuda100_cudnn7_1    pytorch
pywin32                   223              py37hfa6e2cd_1
requests                  2.21.0                   py37_0
ruamel_yaml               0.15.46          py37hfa6e2cd_0
setuptools                40.6.3                   py37_0
six                       1.12.0                   py37_0
sqlite                    3.26.0               he774522_0
tk                        8.6.8                hfa6e2cd_0
torchvision               0.2.1                      py_2    pytorch
urllib3                   1.24.1                   py37_0
vc                        14.1                 h0510ff6_4
vs2015_runtime            14.15.26706          h3a45250_0
wheel                     0.32.3                   py37_0
win_inet_pton             1.0.1                    py37_1
wincertstore              0.2                      py37_0
xz                        5.2.4                h2fa13f4_4
yaml                      0.1.7                hc54c509_2
zlib                      1.2.11               h62dcd97_3
zstd                      1.3.7                h508b16e_0

And the output of the deviceQuery, from CUDA tests suite:

(torch) >type deviceQuery.out
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060"
  CUDA Driver Version / Runtime Version          10.1 / 10.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6144 MBytes (6442450944 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1733 MHz (1.73 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS

I've already tried reinstalling the system, uninstalling and reinstalling Anaconda and Miniconda, and nothing changes.

Should I open a bug report?

Thanks!

@gmseabra
Copy link

Hi all,

I just wanted to mention that I have just tried with the nightly build of pytorch, and the problem disappears. Using the nightly build available today (02/20/2019), I get the following:

(torch_nightly) >python
Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> quit()

So it seems that, at some point between stable and today's build, the issue has been resolved.

@peterjc123
Copy link
Collaborator

@gmseabra I'm glad that it's solved. But I'm not sure which one is related to this.

@juanwulu
Copy link
Author

juanwulu commented Feb 21, 2019

Thank you guys all for all your support.😄
especially @gmseabra

I couldn't fix the problem so I decided to downgrade my python version to 3.6.8 and it somehow worked.
The bugs may still exist on a newer version of of python but to those who are currently stuck at this problem, downgrading your python version might be a good solution.

@gmseabra
Copy link

Thank you guys all for all your support.😄
especially @gmseabra
I couldn't fix the problem so I decided to downgrade my python version to 3.6.8 and it somehow worked.
The bugs may still exist on a newer version of of python but to those who are currently stuck at this problem, downgrading your python version might be a good solution.

Have you tried using the nightly-build? That did work fine for me (as of 02/20/2019).

@gmseabra
Copy link

gmseabra commented Feb 21, 2019

@gmseabra I'm glad that it's solved. But I'm not sure which one is related to this.

@peterjc123 Thanks. Is there any idea about when the "nightly build" becomes part of the "stable" distribution?

@peterjc123
Copy link
Collaborator

peterjc123 commented Feb 22, 2019

@gmseabra It won't be too soon. Our release cycle is ~90 days. BTW, would you please try if removing nvcuda.dll and nvfatbinaryloader.dll from [Anaconda Root]\Lib\site-packages\torch\lib helps?

@gmseabra
Copy link

@gmseabra It won't be too soon. Our release cycle is ~90 days.

Thanks.

BTW, would you please try if removing nvcuda.dll and nvfatbinaryloader.dll from [Anaconda Root]\Lib\site-packages\torch\lib helps?

Tried removing from
[Miniconda3]\envs\torch\Lib\site-packages\torch\lib

I also tried copying those DLLs from my torch-nightly env to the torch env, but there was no difference either way.

@andrei-rusu
Copy link

andrei-rusu commented Feb 26, 2019

I am getting the same error as this with PyTorch 1.0.1 and CUDA 10. Indeed, updating to one of the nightly builds solved the issue, yet I stumbled upon a "classical nightly issue": some random Assertion Failure which prompted me to message PyTorch developers about it. This is getting really frustrating since I've been losing considerable time in reconfiguring my environment. I think I will have to downgrade some components now...

EDIT: Downgrading to PyTorch 1.0.0 solved the issue for me as well. Clearly, there's a problem with 1.0.1.

@jsmith8888
Copy link

I am getting the same error.

My setup:

  • Nvidia GTX 1050Ti
  • Windows 10 Pro
  • Conda 4.6.7
  • Python 3.7.1
  • CUDA V10.0.130
  • PyTorch 1.0.1

My Jupyter Notebook Test:

torch.cuda.is_available()
True

torch.backends.cudnn.enabled
True

torch.cuda.current_device()

RuntimeError Traceback (most recent call last)
in
----> 1 torch.cuda.current_device()

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in current_device()
339 def current_device():
340 r"""Returns the index of a currently selected device."""
--> 341 _lazy_init()
342 return torch._C._cuda_getDevice()
343

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in _lazy_init()
160 "Cannot re-initialize CUDA in forked subprocess. " + msg)
161 _check_driver()
--> 162 torch._C._cuda_init()
163 _cudart = _load_cudart()
164 _cudart.cudaGetErrorName.restype = ctypes.c_char_p

RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

torch.cuda.device(0)
<torch.cuda.device at 0x21f81413fd0>

torch.cuda.device_count()
1

torch.cuda.get_device_name(0)

RuntimeError Traceback (most recent call last)
in
----> 1 torch.cuda.get_device_name(0)

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in get_device_name(device)
274 if :attr:device is None (default).
275 """
--> 276 return get_device_properties(device).name
277
278

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in get_device_properties(device)
296 def get_device_properties(device):
297 if not _initialized:
--> 298 init() # will define _get_device_properties and _CudaDeviceProperties
299 device = _get_device_index(device, optional=True)
300 if device < 0 or device >= device_count():

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in init()
142 Does nothing if the CUDA state is already initialized.
143 """
--> 144 _lazy_init()
145
146

C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init_.py in _lazy_init()
160 "Cannot re-initialize CUDA in forked subprocess. " + msg)
161 _check_driver()
--> 162 torch._C._cuda_init()
163 _cudart = _load_cudart()
164 _cudart.cudaGetErrorName.restype = ctypes.c_char_p

RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:87

The THCGeneral.cpp code can be found at:
https://github.com/pytorch/pytorch/blob/master/aten/src/THC/THCGeneral.cpp

The code block in THCGeneral where the error is thrown is:

for (int i = 0; i < numDevices; ++i) {
THCCudaResourcesPerDevice* res = THCState_getDeviceResourcePtr(state, i);
THCudaCheck(cudaSetDevice(i));

/* The scratch space that we want to have available per each device is
   based on the number of SMs available per device. We guarantee a
   minimum of 128kb of space per device, but to future-proof against
   future architectures that may have huge #s of SMs, we guarantee that
   we have at least 16 bytes for each SM. */
int numSM = at::cuda::getDeviceProperties(i)->multiProcessorCount;
size_t sizePerStream =
  MIN_GLOBAL_SCRATCH_SPACE_PER_DEVICE >= numSM * MIN_GLOBAL_SCRATCH_SPACE_PER_SM_STREAM ?
  MIN_GLOBAL_SCRATCH_SPACE_PER_DEVICE :
  numSM * MIN_GLOBAL_SCRATCH_SPACE_PER_SM_STREAM;
res->scratchSpacePerStream = sizePerStream;

}

Line 87 of this code is:
int numSM = at::cuda::getDeviceProperties(i)->multiProcessorCount;

Any ideas why I and so many others are experiencing this exact same error?

@peterjc123
Copy link
Collaborator

Looks like the callback was accidentally triggered here. https://github.com/pytorch/pytorch/blame/master/torch/cuda/__init__.py#L188. Usually it won't happen. Anyway, I'll try to add a protection clause here for Windows.

@Yiyiyimu
Copy link

@peterjc123 Regarding situation from https://forums.fast.ai/t/cuda-runtime-error-30-resnet-not-loading/38556/2, I think there is some inner errors between jupyter and pytorch 1.0.1, as a result of downgrading pytorch 1.0.0 could solve the problem.

I noticed several issues raised about the same problem and this might be the best answer till now.

@AndreiCostinescu
Copy link

AndreiCostinescu commented Apr 16, 2019

My system:
Windows 10
Cuda 10.1
Python 3.7.2
PyTorch 1.0.1
NVIDIA GeForce GTX 1050 Ti

The following always works:

import torch
torch.cuda.current_device()

The following always fails for me:

import torch
torch.cuda.is_available()
torch.cuda.current_device()  # fails here

My solution was to add to my scripts the call to torch.cuda.current_device() before any other cuda calls.
Hope this gives a hint as to where to look for the issue :)

@reinhub-1
Copy link

I ran into the same problem (GTX 1050, anaconda environment, Win10, latest pytorch installed with anaconda (both pip and conda).
I uninstalled, reinstalled pytorch in different environments several times without success until now.

Before that issue came up, pytorch worked as usual. I didn't change anything in the settings nor did I install packages, it just came up.

@deepseawhale
Copy link

Quote from @andrei-rusu

I am getting the same error as this with PyTorch 1.0.1 and CUDA 10. Indeed, updating to one of the nightly builds solved the issue, yet I stumbled upon a "classical nightly issue": some random Assertion Failure which prompted me to message PyTorch developers about it. This is getting really frustrating since I've been losing considerable time in reconfiguring my environment. I think I will have to downgrade some components now...

EDIT: Downgrading to PyTorch 1.0.0 solved the issue for me as well. Clearly, there's a problem with 1.0.1.

Downgrading PyTorch to 1.0.0 solved mine, also I need to ensure script ran in an administrative command. Thanks!

@reinhub-1
Copy link

Downgrading to version 1.0.0 also worked for me. Also, I changed some nvidia graphics card settings (maximum performance=yes) which may also have contributed to get it working.

@Jonas1312
Copy link
Contributor

Same issue here:
Windows 10
NVIDIA GeForce GTX 940mx
Python 3.6.8
PyTorch 1.0.1
CUDA 10.1
cudnn 7.5

Downgrading to pytorch 1.0.0 solved the issue

@peterjc123
Copy link
Collaborator

peterjc123 commented Apr 22, 2019

Well, would you guys please check whether this error persists in the nightlies? Downgrading is a workaround here, but it does little help to locate the actual cause of this issue. Let me conclude all the known possible reasons that may cause this issue:

  1. From 1.0.0 to 1.0.1, we switched to use the cuda libraries provided by the conda-forge channel in the conda package. Previously, we copied these libraries in the build machine into the binaries. We can ignore this factor if we use the pip package.
  2. The dll loading process of python in conda has changed. It started to use AllDllDirectory, which does not ensure the loading sequence. We can ignore this factor if we downgrade python to 3.6.7 or 3.7.1.
  3. The fix/issue mentioned by @ezyang Unify cudaGetDeviceCount implementations. #18445. We can ignore this factor if we use the nightlies or build from source.

I'd be grateful if you could help me locate the issue. It's currently hard to fix because I cannot reproduce it on my side.

@Jonas1312
Copy link
Contributor

Nightly builds for windows are available here: https://download.pytorch.org/whl/nightly/cu100/torch_nightly.html but only for version 1.0.0

@peterjc123
Copy link
Collaborator

@Jonas1312 You mean CUDA 10? If you are talking about the version of PyTorch, it will always build for the latest source every day.

@Jonas1312
Copy link
Contributor

Jonas1312 commented Apr 22, 2019

https://download.pytorch.org/whl/nightly/cu100/torch_nightly.html shows the following packages: https://pastebin.com/yYxdEqU5

I tried with the last windows build:

c:\Users\Jonas\Desktop>python36 -m pip install torch_nightly-1.0.0.dev20190421-cp36-cp36m-win_amd64.whl
Processing c:\users\jonas\desktop\torch_nightly-1.0.0.dev20190421-cp36-cp36m-win_amd64.whl
Installing collected packages: torch-nightly
Successfully installed torch-nightly-1.0.0.dev20190421

c:\Users\Jonas\Desktop>python36
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print("torch.cuda.is_available()   =", torch.cuda.is_available())
torch.cuda.is_available()   = True
>>> print("torch.cuda.device_count()   =", torch.cuda.device_count())
torch.cuda.device_count()   = 1
>>> print("torch.cuda.device('cuda')   =", torch.cuda.device('cuda'))
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x00000251657A7518>
>>> print("torch.cuda.current_device() =", torch.cuda.current_device())
torch.cuda.current_device() = 0
>>> torch.__version__
'1.0.0.dev20190421'
>>>

It's working but I don't understand why is it showing version 1.0.0 even if it's built with the latest source?

@peterjc123
Copy link
Collaborator

@JohnRambo Oh, I see. I will update the build scripts.

@peterjc123
Copy link
Collaborator

@JohnRambo Should be fixed now. Looks like I forgot to sync with upstream after I sent these changes about the version change.

@Jonas1312
Copy link
Contributor

Jonas1312 commented Apr 25, 2019

@peterjc123 I've just installed torch_nightly-1.1.0.dev20190424-cp36-cp36m-win_amd64.whl and it seems that it fixed the issue:

C:\Users\Jonas>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:26_Pacific_Standard_Time_2019
Cuda compilation tools, release 10.1, V10.1.105

C:\Users\Jonas>python36
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print("torch.cuda.is_available()   =", torch.cuda.is_available())
torch.cuda.is_available()   = True
>>> print("torch.cuda.device_count()   =", torch.cuda.device_count())
torch.cuda.device_count()   = 1
>>> print("torch.cuda.device('cuda')   =", torch.cuda.device('cuda'))
torch.cuda.device('cuda')   = <torch.cuda.device object at 0x00000262DB837EB8>
>>> print("torch.cuda.current_device() =", torch.cuda.current_device())
torch.cuda.current_device() = 0
>>> torch.cuda.get_device_name(0)
'GeForce 940MX'
>>> torch.__version__
'1.1.0.dev20190424'
>>> a = torch.ones((1,1,1)).cuda()
>>> a
tensor([[[1.]]], device='cuda:0')
>>>

Works with cuda 10.0 also!

@trias702
Copy link

trias702 commented May 1, 2019

I just got this error for the first time today, after running PyTorch 1.0.1 (CUDA 10.0) on Windows 10 for months and months with no problems.

In my case, the error only started happening when I updated my Nvidia graphics driver to 430.53 from 417.35. Luckily, simply reverting to driver version 417.35 caused the error to go away and everything works fine again. I did not need to touch my CUDA or Python environment to fix it, just roll back the graphics driver. Very odd, looks like Nvidia changed something in the driver code which is causing this.

My setup:

Windows 10 1607 64-bit
Python 3.6.8
PyTorch 1.0.1
CUDA 10.0

PyTorch installed via pip

@xiaodi68
Copy link

xiaodi68 commented May 1, 2019

I got the similar issue with an error as "RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:51" on my new machine (Windows 10, Nvidia RTX2070). (Also referred to https://discuss.pytorch.org/t/a-error-when-using-gpu/32761).
I tried a lot of methods suggested, such as cuda downgrading, Anaconda downgrading and upgrading, etc. However not successful.
For my case, the error suddenly gone seems only after I installed the latest Nvidia gaming driver.
Hope it is helpful.

@peterjc123
Copy link
Collaborator

We are building this time with the NVIDIA driver 418.96. But according to your test results, I don't know whether I should downgrade or upgrade it. However, if the problem is caused by the driver, we can actually do some tests on this. Also, if you have time, you can try whether building from source solves it.

@peterjc123
Copy link
Collaborator

Actually, from the CUDA document, I can only find that there is a lower limit of the driver version for each CUDA version: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components. But it didn't mention what will happen if we compile binaries using newer versions of GPU drivers, or the driver version mismatches with the one on the user's PC.

@x1155665
Copy link

x1155665 commented May 1, 2019

I got the similar issue with an error as "RuntimeError: cuda runtime error (30) : unknown error at ..\aten\src\THC\THCGeneral.cpp:51" on my new machine (Windows 10, Nvidia RTX2070). (Also referred to https://discuss.pytorch.org/t/a-error-when-using-gpu/32761).
I tried a lot of methods suggested, such as cuda downgrading, Anaconda downgrading and upgrading, etc. However not successful.
For my case, the error suddenly gone seems only after I installed the latest Nvidia gaming driver.
Hope it is helpful.

Updating Nivida driver (to 430.39) also worked for me.

@feferna
Copy link

feferna commented May 2, 2019

My system:
Windows 10
Cuda 10.1
Python 3.7.2
PyTorch 1.0.1
NVIDIA GeForce GTX 1050 Ti

The following always works:

import torch
torch.cuda.current_device()

The following always fails for me:

import torch
torch.cuda.is_available()
torch.cuda.current_device()  # fails here

My solution was to add to my scripts the call to torch.cuda.current_device() before any other cuda calls.
Hope this gives a hint as to where to look for the issue :)

Thanks! This is the exactly same thing that happens with me on Windows 10.

If I use torch.cuda.current_device() before anything cuda-related, it works like a charm.

@ezyang
Copy link
Contributor

ezyang commented May 6, 2019

If I use torch.cuda.current_device() before anything cuda-related, it works like a charm.

For the record, this isn't supposed to be necessary, but it's possible this is broken.

@peterjc123
Copy link
Collaborator

Guys, I seem to find the root cause of this issue with the help of @Jonas1312 in #20635, it is caused by the fact that we changed the way we link against our libraries against cudart. I have made the PR #21062. You can try whether it fixes your problem.

@wwyi1828
Copy link

My system:
Windows 10
Cuda 10.1
Python 3.7.2
PyTorch 1.0.1
NVIDIA GeForce GTX 1050 Ti

The following always works:

import torch
torch.cuda.current_device()

The following always fails for me:

import torch
torch.cuda.is_available()
torch.cuda.current_device()  # fails here

My solution was to add to my scripts the call to torch.cuda.current_device() before any other cuda calls.
Hope this gives a hint as to where to look for the issue :)

Thank you! I have the same problem, and I need to reboot the Python every time. According to what you said, I add torch.cuda.current_device() after import torch. It works.

@ezyang
Copy link
Contributor

ezyang commented Jun 13, 2019

fix was merged

@ezyang ezyang closed this as completed Jun 13, 2019
@n1tesla
Copy link

n1tesla commented Aug 1, 2019

My system:
Windows 10
Cuda 10.1
Python 3.7.2
PyTorch 1.0.1
NVIDIA GeForce GTX 1050 Ti

The following always works:

import torch
torch.cuda.current_device()

The following always fails for me:

import torch
torch.cuda.is_available()
torch.cuda.current_device()  # fails here

My solution was to add to my scripts the call to torch.cuda.current_device() before any other cuda calls.
Hope this gives a hint as to where to look for the issue :)

it worked for me as well

import torch
torch.cuda.current_device()

@yuanzhoulvpi2017
Copy link

I don't think cuda error 30 is an error on our side. Please try these things first.

  1. Re-install latest GPU driver
  2. Reboot
  3. Ensure you have admin access

Yes,,after reboot my computer ,this error has been gone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: windows Windows support for PyTorch needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests