Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values #190

Open
meihuabo opened this issue Jul 25, 2020 · 3 comments

Comments

@meihuabo
Copy link

Traceback (most recent call last):
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[{{node generator/encoder_1/conv2d/kernel/values}}]]
(1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[{{node generator/encoder_1/conv2d/kernel/values}}]]
[[convert_inputs/convert_image/Minimum/_802]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "pix2pix.py", line 815, in
main()
File "pix2pix.py", line 781, in main
results = sess.run(fetches, options=options, run_metadata=run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Invalid argument: Nan in summary histogram for: generator/encoder_1/conv2d/kernel/values
[[node generator/encoder_1/conv2d/kernel/values (defined at /home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[convert_inputs/convert_image/Minimum/_802]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'generator/encoder_1/conv2d/kernel/values':
File "pix2pix.py", line 815, in
main()
File "pix2pix.py", line 709, in main
tf.summary.histogram(var.op.name + "/values", var)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram
tag=tag, values=values, name=scope)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/pro/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()

@meihuabo
Copy link
Author

Generally, it is caused by the improper setting of the learning rate of the optimizer. We can try to use a smaller learning rate for training to solve this problem.

@skabbit
Copy link

skabbit commented Oct 8, 2020

Try tensorflow v1.14

@jordan-bird
Copy link

This happens if you run TF2.0 in compatibility mode, whereas the v1.14 release as mentioned by @skabbit works without the error. I can't see any major differences between the compatibility and TF1.14 learning rate implementations so I don't know where the issue is actually coming from.

I set up TF1.14 in a Conda environment and it hasn't crashed in 200 epochs. If you're using TF2.0 and have no virtual environments then a hacky solution is to loop the python command in a bash script and run that, just remember to decrease the model save interval so you don't lose too many steps once it crashes and restarts. On the bright side you can leave it running unattended without having to manually restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants