Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training loss is always nan. #479

Open
LuoXubo opened this issue Feb 17, 2024 · 3 comments
Open

The training loss is always nan. #479

LuoXubo opened this issue Feb 17, 2024 · 3 comments

Comments

@LuoXubo
Copy link

LuoXubo commented Feb 17, 2024

Hi milesial, thanks for your nice work! However, when I was training the U-Net under the instruction of the README, the training loss is always "nan" and the validation dice score is a very small number, like 8.114e-12. Could you help me solve this problem? Thanks a lot!
WeChat06f8244e19325314f8116b1cd45e4771

@yuhanc0205
Copy link

I am experiencing the same issue, I have used all the default settings and the Carvana Dataset, but my loss is always nan and dice score is not changing during training. Did you find any solution ?

@binbin395
Copy link

I found if revert it to the tag v4.0, it's ok, maybe some one can find which commit after that version involved the problem.

@benlin1211
Copy link

benlin1211 commented Apr 11, 2024

I managed to solve this problem by turning off mixed precision flag.
That is, instead of using python train.py --amp,
use python train.py to train the code.
Although it takes more time and memory during training, the code can be trained successfully.

Similar issue: pytorch/pytorch#40497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants