Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [68]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #19853

Open
ASAmbitious opened this issue May 8, 2024 · 1 comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.2.x

Comments

@ASAmbitious
Copy link

Bug description

Traceback (most recent call last):
File "main_train.py", line 61, in
main(cfg)
File "main_train.py", line 50, in main
trainer.fit()
File "/mnt/inais/data1/syp/wgan/fabric/decalib/trainer.py", line 373, in fit
self.fabric.backward(all_loss)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/lightning/fabric/fabric.py", line 359, in backward
self._precision.backward(tensor, module, *args, **kwargs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/lightning/fabric/plugins/precision/precision.py", line 73, in backward
tensor.backward(*args, **kwargs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/envs/DECA_2/lib/python3.7/site-packages/torch/autograd/init.py", line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [68]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
I encountered this problem during operation, how should I solve it?

What version are you seeing the problem on?

master

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@ASAmbitious ASAmbitious added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 8, 2024
@LawJarp-A
Copy link

Without code to reproduce this particular error, it'll be difficult to solve it. Maybe provide the model code and training code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.2.x
Projects
None yet
Development

No branches or pull requests

2 participants