New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to solve 'Input is not Invertible error'? #40
Comments
I'm having this issue as well. I've been able to train on custom dataset without conditioning on class labels. However, if I set |
I first run the experiment on an I doubt it can be solved. Its prone to bad random initialisations, that lead it to |
How about adding "+ tf.eye(shape[3]) * 10e-4 " to this line: https://github.com/openai/glow/blob/master/model.py#L451 ? Does that make any difference? |
I also experienced a similar error. diff --git a/model.py b/model.py
index b918ab0..68cb3fe 100644
--- a/model.py
+++ b/model.py
@@ -373,7 +373,7 @@ def revnet2d_step(name, z, logdet, hps, reverse):
h = f("f1", z1, hps.width, n_z)
shift = h[:, :, :, 0::2]
# scale = tf.exp(h[:, :, :, 1::2])
- scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.)
+ scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.) + 1e-10
z2 += shift
z2 *= scale
logdet += tf.reduce_sum(tf.log(scale), axis=[1, 2, 3])
@@ -393,7 +393,7 @@ def revnet2d_step(name, z, logdet, hps, reverse):
h = f("f1", z1, hps.width, n_z)
shift = h[:, :, :, 0::2]
# scale = tf.exp(h[:, :, :, 1::2])
- scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.)
+ scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.) + 1e-10
z2 /= scale
z2 -= shift
logdet -= tf.reduce_sum(tf.log(scale), axis=[1, 2, 3])
diff --git a/tfops.py b/tfops.py
index d978419..2e7c556 100644
--- a/tfops.py
+++ b/tfops.py
@@ -449,9 +449,9 @@ def gaussian_diag(mean, logsd):
o.sample = mean + tf.exp(logsd) * o.eps
o.sample2 = lambda eps: mean + tf.exp(logsd) * eps
o.logps = lambda x: -0.5 * \
- (np.log(2 * np.pi) + 2. * logsd + (x - mean) ** 2 / tf.exp(2. * logsd))
+ (np.log(2 * np.pi) + 2. * logsd + (x - mean) ** 2 / (tf.exp(2. * logsd) + 1e-10))
o.logp = lambda x: flatten_sum(o.logps(x))
- o.get_eps = lambda x: (x - mean) / tf.exp(logsd)
+ o.get_eps = lambda x: (x - mean) / (tf.exp(logsd) + 1e-10)
return o |
@tatsuhiko-inoue, Thanks for the suggestion. It didn't work for me though. Those modifications are under the condition |
@omidsakhi, that didn't solve it for me either, I'm afraid. Thanks. |
When I execute glow, the gradient of "logsd" in gaussian_diag() may be NaN. I was able to avoid NaN gradient by calculate the gradient of "x/exp(y)" collectively as follows. @tf.custom_gradient
def div_by_exp(x, y):
exp_y = tf.exp(y) + 1e-10
ret = x / exp_y
def _grad(dy):
return dy/exp_y, dy*-ret
return ret, _grad
def gaussian_diag(mean, logsd):
:
o.logps = lambda x: -0.5 * (np.log(2 * np.pi) + 2. * logsd + div_by_exp((x - mean) ** 2, 2*logsd))
: |
I met the issue same how to solve it?@tatsuhiko-inoue @nuges01 @arunpatro |
Hello guys, I found a solution for this 'not invertible' problem. During the training, the weighs of invertible 1x1 conv keeps increase to balance the log-determinant terms generated by invertible 1x1 conv and affine coupling layer/actnorm. This can be solved by adding an regularization term only for the weights of invertible 1x1 conv. In practice I use l2 regulariztion. But it's also worth mentioning that after adding regularization term, the number of epochs will slightly increase to converge to the same NLL. I discussed it in our recent publication "Generative Model with Dynamic Linear Flow", which improves the performance of flow-based methods significantly and converges faster than Glow. Our code is here. |
I think you mean "+ tf.eye(3) * 10e-4" shape[3] is not defined. |
I am trying to train a GLOW mapping on a custom dataset. However while training, I frequently receive a
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input is not invertible
error. Upon seeing the logs, I see that the training/validation stats have reached eitherinf
ornan
.I then tried to just reproduce your results for celeba 256x256 Qualitatively. However, I still face such issues. I am lost as to how to debug. I downloaded the
celeba-tfr
dataset locally.Command:
Namespace:
Trace:
I suspected it is because of bad learning rates which might make the kernel non-invertible, I played with low LRs, but of no help.
The text was updated successfully, but these errors were encountered: