Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN appearing on tf.gradients calculation with tf.where and division by zero on the false branch #20091

Closed
mikefairbank opened this issue Jun 18, 2018 · 3 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower

Comments

@mikefairbank
Copy link

mikefairbank commented Jun 18, 2018

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    Yes, script is below
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Linux 4.15.0-23-generic Cuda 3.0? #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    VERSION="18.04 LTS (Bionic Beaver)"
  • TensorFlow installed from (source or binary):
    binary
  • TensorFlow version (use command below):
    v1.8.0-0-g93bc2e2072 1.8.0
  • Python version:
    Python 3.6.5
  • Bazel version (if compiling from source): n/a
  • GCC/Compiler version (if compiling from source): n/a
  • CUDA/cuDNN version: n/a, using CPU version
  • GPU model and memory: n/a using CPU
  • Exact command to reproduce: just run "python3 script.py"

Describe the problem

When using the tf.where function where a division by zero exists in one of the two where branches, you get a NaN gradient even if the division by zero was on the where branch which was not executed.

This seems similar to #2540 but the workarounds suggested there (e.g. using tf.boolean_mask) did not work.

Source code / logs

import tensorflow as tf
sess = tf.Session()
W1 = tf.Variable([2.0])
W2 = tf.Variable([0.0])
output=tf.where(W1>4, W1/W2, tf.zeros_like(W1))  # gives correct answer (zero) since W1>4 is false
gradient=tf.gradients(output, W2)[0] # should be zero, but it gives NaN
sess.run(tf.global_variables_initializer())
print(sess.run([output, gradient]))

Program output:

#[array([0.], dtype=float32), array([nan], dtype=float32)]

@facaiy
Copy link
Member

facaiy commented Jun 18, 2018

I agree that the issue is duplication of #2540, do you try the workaround below suggested by @anishathalye ?

x = tf.placeholder(tf.float32)
# y = tf.where(x > 0, 0., tf.exp(x))

# trick: we're not using the result of safe_exp when x > 0, so we can
# substitute a safe value for x in that case
# it doesn't really matter what we put in here, as long as the backward pass
# returns some finite value
safe_exp = tf.exp(tf.where(x > 0, 1.0, x))
y = tf.where(x > 0, 0., safe_exp)

I think it should solve your problem.

In fact, I also propose to implement a new op #15706 to fix the issue totally, unfortunately, google tensorflow don't reply to it.

@mikefairbank
Copy link
Author

Thanks. Yes it did solve my problem.

import tensorflow as tf
sess = tf.Session()

W1 = tf.Variable([2.0])
W2 = tf.Variable([0.0])

safe_W2 = tf.where(W1>4, W2, [1.0])
output = tf.where(W1 >4, W1/safe_W2, tf.zeros_like(W1))
gradient=tf.gradients(output, W2)[0]
sess.run(tf.global_variables_initializer())

print(sess.run([output, gradient])) # prints 0,0, i.e. correct answers now

@skye skye added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jul 1, 2018
@tensorflowbutler
Copy link
Member

Nagging Assignee @skye: It has been 44 days with no activity and this issue has an assignee. Please update the label and/or status accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

No branches or pull requests

5 participants