The training loss is always nan. #479

LuoXubo · 2024-02-17T02:43:12Z

Hi milesial, thanks for your nice work! However, when I was training the U-Net under the instruction of the README, the training loss is always "nan" and the validation dice score is a very small number, like 8.114e-12. Could you help me solve this problem? Thanks a lot!

yuhanc0205 · 2024-03-07T06:13:46Z

I am experiencing the same issue, I have used all the default settings and the Carvana Dataset, but my loss is always nan and dice score is not changing during training. Did you find any solution ?

binbin395 · 2024-03-12T04:28:34Z

I found if revert it to the tag v4.0, it's ok, maybe some one can find which commit after that version involved the problem.

benlin1211 · 2024-04-11T13:44:53Z

I managed to solve this problem by turning off mixed precision flag.
That is, instead of using python train.py --amp,
use python train.py to train the code.
Although it takes more time and memory during training, the code can be trained successfully.

Similar issue: pytorch/pytorch#40497

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training loss is always nan. #479

The training loss is always nan. #479

LuoXubo commented Feb 17, 2024

yuhanc0205 commented Mar 7, 2024

binbin395 commented Mar 12, 2024

benlin1211 commented Apr 11, 2024 •

edited

The training loss is always nan. #479

The training loss is always nan. #479

Comments

LuoXubo commented Feb 17, 2024

yuhanc0205 commented Mar 7, 2024

binbin395 commented Mar 12, 2024

benlin1211 commented Apr 11, 2024 • edited

benlin1211 commented Apr 11, 2024 •

edited