Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Inplace operation error in gradient computation #13

Open
cgsavard opened this issue Sep 28, 2022 · 4 comments
Open

Inplace operation error in gradient computation #13

cgsavard opened this issue Sep 28, 2022 · 4 comments

Comments

@cgsavard
Copy link

cgsavard commented Sep 28, 2022

Screen Shot 2022-09-28 at 5 53 57 PM

I have come across this error when trying to train. After a bit of google searching, it seems that something is being updated during the gradient computation before it should be. I was hoping you could help me locate the error and let me know what I need to fix as I am not too familiar with pytorch. I have made no modifications to the utils/nn/tools.py script.

@cgsavard
Copy link
Author

cgsavard commented Sep 30, 2022

I have solved the issue by changing all the inplace operations here and here to non-inplace. Essentially the change is var1 *= var2 was changed to var1 = var1*var2. Should this be changed in the code permanently to avoid this error in the future?

@hqucms
Copy link
Owner

hqucms commented Sep 30, 2022

Hi @cgsavard -- can you share the pytorch version? I don't seem to be able to reproduce this error in e.g., 1.12.1.

@cgsavard
Copy link
Author

Yes, this occurred after I installed pytorch in this way "conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge" because I was working on a different GPU (K40) with a version of CUDA >11.6 (11.7). The pytorch version was the stable 1.12.1 and so I think it was actually the newer CUDA version that raised the issue.

@hqucms
Copy link
Owner

hqucms commented Sep 30, 2022

I tested CUDA 11.6 + PyTorch 1.12.1 and still cannot reproduce this error. Did you change anything else when the problem got solved?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants