RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [68]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #19853

ASAmbitious · 2024-05-08T07:57:04Z

Bug description

Traceback (most recent call last):
File "main_train.py", line 61, in
main(cfg)
File "main_train.py", line 50, in main
trainer.fit()
File "/mnt/inais/data1/syp/wgan/fabric/decalib/trainer.py", line 373, in fit
self.fabric.backward(all_loss)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/lightning/fabric/fabric.py", line 359, in backward
self._precision.backward(tensor, module, *args, **kwargs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/lightning/fabric/plugins/precision/precision.py", line 73, in backward
tensor.backward(*args, **kwargs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [68]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I encountered this problem during operation, how should I solve it?

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment

#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

The text was updated successfully, but these errors were encountered:

LawJarp-A · 2024-05-23T16:54:37Z

Without code to reproduce this particular error, it'll be difficult to solve it. Maybe provide the model code and training code?

ASAmbitious added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 8, 2024

github-actions bot added the ver: 2.2.x label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASAmbitious commented May 8, 2024

LawJarp-A commented May 23, 2024

Comments