RuntimeError: Magnitude of gradient is bad: -nan` when trying to train frameid #38

DiamondRock · 2019-12-27T00:29:24Z

I am trying to train the frameid model, but I get this error at the very beginning of training. I am using the latest version of dynet (2.1). I have ported the open-sesame to python 3, and I am using the python 3 version for training, but even with python 2.7 version, I am still getting the same error.

Traceback (most recent call last): File "/home/anaconda3/envs/pytorch_dynet_copy/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/anaconda3/envs/pytorch_dynet_copy/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/testframenet/open-sesame/sesame/frameid.py", line 295, in <module> trainer.update() File "_dynet.pyx", line 6198, in _dynet.Trainer.update File "_dynet.pyx", line 6203, in _dynet.Trainer.update RuntimeError: Magnitude of gradient is bad: -nan

The text was updated successfully, but these errors were encountered:

clingergab · 2021-03-05T01:13:13Z

I am getting the same issue at the very first epoch:
raceback (most recent call last):= 0.6202 (1409/2272) best_val_f1 = 0.6202: 11%| | 2107/19391 [04:13<4:11: File "/home/gabriel/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/gabriel/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/gabriel/open-sesame/sesame/frameid.py", line 330, in <module> trainer.update() File "_dynet.pyx", line 6198, in _dynet.Trainer.update File "_dynet.pyx", line 6203, in _dynet.Trainer.update RuntimeError: Magnitude of gradient is bad: -nan

I tried decreasing the learning rate but it hasn't helped.
Any suggestions?

cjcourt · 2021-06-14T13:10:23Z

I am also observing this exact issue training the frameid model with python 3.7, ubuntu18.04 dynet 2.1. I have tried multiple different trainers SGDTrainer, AdagradTrainer and AdamTrainer, each with many different learning rates from 0.1 to 1e-6.

Any suggestions would be greatly appreciated.

ravy101 · 2021-12-30T02:54:35Z

I had the same issue but after some trial and error I found that some loss values that were not 'None' but became NaN when .scalar_value() was called. I added a NaN check to frameid.py training code.
Just import math and replace
if trexloss is not None:

With:
if trexloss is not None and not math.isnan(trexloss.scalar_value()):

Hope this helps.

JerrisonChang · 2022-07-27T01:31:50Z

Thank you @ravy101 . I came into the same issue and the proposed solution helped me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Magnitude of gradient is bad: -nan` when trying to train frameid #38

RuntimeError: Magnitude of gradient is bad: -nan` when trying to train frameid #38

DiamondRock commented Dec 27, 2019 •

edited

clingergab commented Mar 5, 2021

cjcourt commented Jun 14, 2021

ravy101 commented Dec 30, 2021 •

edited

JerrisonChang commented Jul 27, 2022

RuntimeError: Magnitude of gradient is bad: -nan` when trying to train frameid #38

RuntimeError: Magnitude of gradient is bad: -nan` when trying to train frameid #38

Comments

DiamondRock commented Dec 27, 2019 • edited

clingergab commented Mar 5, 2021

cjcourt commented Jun 14, 2021

ravy101 commented Dec 30, 2021 • edited

JerrisonChang commented Jul 27, 2022

DiamondRock commented Dec 27, 2019 •

edited

ravy101 commented Dec 30, 2021 •

edited