Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN error when clip gradients. #28

Open
LTlitong opened this issue May 26, 2020 · 0 comments
Open

NaN error when clip gradients. #28

LTlitong opened this issue May 26, 2020 · 0 comments

Comments

@LTlitong
Copy link

Hi,
The Vanilla Seq2Seq and HRED models report a "NaN tensor error" at the first training step.

The error code is clipped_grads, grad_norm = tf.clip_by_global_norm(self.gradients, params.max_gradient_norm) in hred_model.py.

How can I solve this problem?

P.S.

  • use embedding : random300
  • tensorfolw-gpu: 1.12.1
  • 3-turn dataset
  • THRED and TA-Seq2Seq work well

It tracebacks:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/HRED/thred/main.py", line 6, in
tf.app.run(main=thred_main)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/data/HRED/thred/main.py", line 45, in main
model.train()
File "/data/HRED/thred/models/hierarchical_base.py", line 132, in train
step_result = loaded_train_model.train(train_sess)
File "/data/HRED/thred/models/hred/hred_model.py", line 446, in train
self.learning_rate])
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had NaN values
[[node hred_graph/VerifyFinite/CheckNumerics (defined at /data/HRED/thred/models/hred/hred_model.py:131) = CheckNumericsT=DT_FLOAT, _class=["loc:@hred_graph/VerifyFinite/control_dependency"], message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[{{node hred_graph/clip_by_global_norm/mul/_187}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3642_hred_graph/clip_by_global_norm/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant