You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "tools/train.py", line 145, in
main()
File "tools/train.py", line 141, in main
runner.train()
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
outputs = self.runner.model.train_step(
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 116, in train_step
optim_wrapper.update_params(parsed_losses)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 196, in update_params
self.backward(loss)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 220, in backward
loss.backward(**kwargs)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
I encountered this issue when running all three methods(mvit,timesformer,uniformerv2)
Branch
main branch (1.x version, such as
v1.0.0
, ordev-1.x
branch)Prerequisite
Environment
CUDA_HOME: /usr/local/cuda-11.1
NVCC: Cuda compilation tools, release 11.1, V11.1.105
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu111
PyTorch compiling details: PyTorch built with:
TorchVision: 0.9.1+cu111
OpenCV: 4.9.0
MMEngine: 0.10.3
MMAction2: 1.2.0+4d6c934
MMCV: 2.1.0
Describe the bug
Traceback (most recent call last):
File "tools/train.py", line 145, in
main()
File "tools/train.py", line 141, in main
runner.train()
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
outputs = self.runner.model.train_step(
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 116, in train_step
optim_wrapper.update_params(parsed_losses)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 196, in update_params
self.backward(loss)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", line 220, in backward
loss.backward(**kwargs)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling
cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
I encountered this issue when running all three methods(mvit,timesformer,uniformerv2)
Reproduces the problem - code sample
No response
Reproduces the problem - command or script
python tools/train.py configs/recognition/mvit/mvit-small-p244_32xb16-16x4x1-200e_kinetics400-rgb.py --seed=0 --deterministic
Reproduces the problem - error message
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: