Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. #81

Open
Alexia1994 opened this issue Nov 16, 2021 · 4 comments

Comments

@Alexia1994
Copy link

ERROR:ignite.engine.engine.Engine:Current run is terminating due to exception: unsupported operand type(s) for /: 'str' and 'int'.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed.
/opt/conda/conda-bld/pytorch_1634272168290/work/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed.
ERROR:ignite.engine.engine.Engine:Engine run is terminating due to exception: unsupported operand type(s) for /: 'str' and 'int'.
Traceback (most recent call last):
File "train.py", line 237, in
train()
File "train.py", line 225, in train
trainer.run(train_loader, max_epochs=args.n_epochs)
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 850, in run
return self._internal_run()
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 952, in _internal_run
self._handle_exception(e)
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 716, in _handle_exception
raise e
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 937, in _internal_run
hours, mins, secs = self._run_once_on_dataset()
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 705, in _run_once_on_dataset
self._handle_exception(e)
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 716, in _handle_exception
raise e
File "/home/work/zhangao/ZoooSP/python/miniconda3/envs/xxd/lib/python3.7/site-packages/ignite/engine/engine.py", line 688, in _run_once_on_dataset
self.state.output = self._process_function(self, self.state.batch)
File "train.py", line 130, in update
loss = lm_loss / args.gradient_accumulation_steps
TypeError: unsupported operand type(s) for /: 'str' and 'int'

@Alexia1994 Alexia1994 changed the title when I run your pretrained code, I got the following ERROR: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. Nov 16, 2021
@Alexia1994
Copy link
Author

加载thu-coai/CDial-GPT_LCCC-large预训练模型后,想用toy_train.txt finetune一下,得到如上报错。请问该如何处理?

@ttppss
Copy link

ttppss commented Jan 22, 2022

Hi,

请问你找到解决方法了吗?

Thanks.

@silverriver
Copy link
Collaborator

请问您用的命令是什么?

@haiduo
Copy link

haiduo commented Oct 13, 2022

分类数num_class看写对了没?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants