Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA/CuDNN related errors occur in Titan-RTX environments #39

Open
dogyoonlee opened this issue Sep 10, 2020 · 0 comments
Open

CUDA/CuDNN related errors occur in Titan-RTX environments #39

dogyoonlee opened this issue Sep 10, 2020 · 0 comments

Comments

@dogyoonlee
Copy link

hello.

I changed my environment in many ways,
but I couldn't get a solution for running your code...

First, my GPU is Titan-RTX
and my attempts are follows.

I also tried to run the code on CUDA 8.0 environments before, but the errors occurs as
almost same as on CUDA 9.0 environments


  1. ---environment---
    ubuntu 18.04
    CUDA 9.0
    CuDNN 7.1
    torch 0.3.1 / 0.4.0
    ==>
    error message :
    Found GPU0 TITAN RTX which requires CUDA_VERSION >= 9000 for
    optimal performance and fast startup time, but your PyTorch was compiled
    with CUDA_VERSION 8000. Please install the correct PyTorch binary
    using instructions from http://pytorch.org

warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))

and process is "Killed" when data are load to the gpu, specifically operating conv2d() command in
55 line of pointnet2_modules.py, self.mlp[i] - _PointnetSAModuleBase function

  1. ---environment---
    ubuntu 18.04
    CUDA 9.0
    CuDNN 7.1
    torch 0.3.1 / 0.4.1
    ==>
    error message :
    RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663

  2. ---environment---
    ubuntu 18.04
    CUDA 9.0
    CuDNN 7.1
    torch 0.3.1 / 0.4.1

and I additionally revised train_cls.py as

torch.backends.cudnn.benchmark = False

==>
Traceback (most recent call last):
File "train_cls.py", line 217, in
main()
File "train_cls.py", line 125, in main
train(train_dataloader, test_dataloader, model, criterion, optimizer, lr_scheduler, bnm_scheduler, args, num_batch)
File "train_cls.py", line 167, in train
pred = model(points)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/SSD1/dogyoon/Relation-Shape-CNN-master/models/rscnn_ssn_cls.py", line 102, in forward
return self.FC_layer(features.squeeze(-1))
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/batchnorm.py", line 66, in forward
exponential_average_factor, self.eps)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/functional.py", line 1251, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size [1, 512]


I really hope to find the solution of this problem as soon as possible
thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant