Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's version fo cuda and pytorch #280

Open
jinggaizi opened this issue Feb 25, 2021 · 2 comments
Open

what's version fo cuda and pytorch #280

jinggaizi opened this issue Feb 25, 2021 · 2 comments

Comments

@jinggaizi
Copy link

i run the aishell example(transducer) and use torch1.4+cuda10.1 or torch1.5+cuda10.1, there are some error as follows:
torch1.4+cuda10.1:
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cudnn/RNN.cpp:1266: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
40%|██████████████████████████████████████████████████████████████████████████████▋ | 144000/360293 [18:43<32:16, 111.68it/s]Traceback (most recent call last):
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 534, in
save_path = pr.runcall(main)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 379, in main
loss, observation = model(batch_dev, task=task, is_eval=True)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 261, in forward
loss, observation = self._forward(batch, task)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 274, in _forward
eout_dict = self.encode(batch['xs'], 'all')
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in encode
xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/torch_utils.py", line 68, in pad_list
max_time = max(x.size(0) for x in xs)
ValueError: max() arg is an empty sequence

torch1.5+cuda10.1:
Removed 0 empty utterances
0%| | 0/360293 [00:00<?, ?it/s]Traceback (most recent call last):
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 534, in
save_path = pr.runcall(main)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/cProfile.py", line 121, in runcall
return func(*args, **kw)
File "/search/speech/jingbojun/exp/neural_sp/examples/aishell/s5/../../../neural_sp/bin/asr/train.py", line 338, in main
teacher=teacher, teacher_lm=teacher_lm)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/search/speech/jingbojun/anaconda3/envs/neural_sp_py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 264, in forward
loss, observation = self._forward(batch, task, teacher, teacher_lm)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 274, in _forward
eout_dict = self.encode(batch['xs'], 'all')
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in encode
xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/seq2seq/speech2text.py", line 395, in
xs = pad_list([np2tensor(x, self.device).float() for x in xs], 0.)
File "/search/odin/jingbojun/exp/neural_sp/neural_sp/models/base.py", line 55, in device
return next(self.parameters()).device
StopIteration

@jinggaizi
Copy link
Author

@hirofumi0810 hi, what's version of torch and cuda that you are running the example

@jinggaizi
Copy link
Author

it's work with torch1.4 when use LAS conf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant