Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about discriminator and backprop #2

Open
3rd3 opened this issue Sep 9, 2016 · 12 comments
Open

Question about discriminator and backprop #2

3rd3 opened this issue Sep 9, 2016 · 12 comments
Labels

Comments

@3rd3
Copy link

3rd3 commented Sep 9, 2016

First of all, thanks for sharing this fantastic and clean code! I'm having trouble understanding this part of your code: Here you run the discriminator on the predicted frames to get the real/fake predictions per frame. You then pass these via the d_scale_preds placeholders back to the generator and finally you regress the generator to bring d_scale_preds closer to a tensor of ones. What I am wondering is how the gradients are backpropagated from the discriminator back to the generator. Can the gradients pass through sess.run statements?

@dyelax
Copy link
Owner

dyelax commented Sep 14, 2016

Hey @3rd3 – good question. Gradients can't pass through sess.run() statements, but that's fine since we aren't trying to train the discriminator there. We just need its forward pass predictions to use in the loss calculation of the generator. There might be a more efficient way to train both the discriminator and generator in the same pass, but the original paper specified that they used a different batch for the discriminator and generator training steps.

@dyelax dyelax closed this as completed Sep 14, 2016
@3rd3
Copy link
Author

3rd3 commented Sep 15, 2016

Thanks for your answer. Sorry, if I am missing something obvious, but the adversarial loss for the generator is log(1-D(G(z))), so wouldn't I get the derivative of D by chain rule in the gradient? How can TensorFlow do the automatic differentiation if the evaluation of D(G(z)) is a constant fed into the loss?

@dyelax
Copy link
Owner

dyelax commented Sep 16, 2016

I think you might be correct – I misunderstood you originally. I'm going to reopen this and create a new branch to explore that. Feel free to contribute if you have some time!

@dyelax dyelax reopened this Sep 16, 2016
@dyelax dyelax added the bug label Sep 16, 2016
@3rd3
Copy link
Author

3rd3 commented Sep 16, 2016

Thanks for your reply. Would that imply that the contribution of the adversarial loss to the gradient is currently always zero? Then I am wondering why the GIFs with adversarial learning still are visually superior, ie. without 'rainbow' artifacts.

@dyelax
Copy link
Owner

dyelax commented Sep 16, 2016

I'm wondering the same thing. Digging through the code right now to try to figure that out.

@3rd3
Copy link
Author

3rd3 commented Sep 30, 2016

Did you make progress? I'm really curious about whether & how this will improve the predictions! Unfortunately, I don't have enough time to help out.

@dyelax
Copy link
Owner

dyelax commented Sep 30, 2016

I'm working on it in the gradient-bug branch. I'll let you know when it's fixed!

@dyelax
Copy link
Owner

dyelax commented Oct 5, 2016

Hey @3rd3 – I think I have it fixed in the gradient-bug branch. I'm testing right now and tweaking some hyperparameters, but feel free to check it out and let me know if you see anything that's still broken

@3rd3
Copy link
Author

3rd3 commented Oct 16, 2016

Looks good so far. I am too busy right now to read the code more carefully, but telling from the code I've looked at, I am not sure whether you are instantiating the discriminator model twice. I think this is necessary to prevent the optimizer from training the discriminator via the combined loss as well while training the generator. This can be done by adding a trainable flag to the define_graph function and passing it with a False value to the variable declarations in the w or b functions in case of the instantiation for the generator. During the second construction of the graph, you need variable scopes with the reuse flag being set to True such that the variables are shared between the two instantiations. An alternative and perhaps easier/more streamlined way of achieving the same would be to create a variable collection for the generator variables and then update via opt.minimize(loss, var_list=<list of variables>). You can perhaps also query the variables from the name scope via tf.get_collection(tf.GraphKeys.VARIABLES, scope='my_scope'). Perhaps there are more ways of disabling gradient updates for certain variables or subgraphs that I am not aware of (i.e. something like tf.stop_gradient(input)). The problem I am seeing with the latter approaches is that TF might not allow for reusing a graph at all without instantiating it multiple times and sharing the variables.

@3rd3
Copy link
Author

3rd3 commented Oct 16, 2016

I've made some changes to my previous message because I hit 'comment' too early. I am not sure whether these changes made it into the email notification.

@dyelax
Copy link
Owner

dyelax commented Oct 19, 2016

@3rd3 – I believe I have that covered. In both the generator and discriminator models, I'm passing minimize() a list of variables to train just the model in question. I'm still having trouble getting this new implementation to perform as well as the previous (incorrect) one though

@3rd3
Copy link
Author

3rd3 commented Oct 20, 2016

If not a bug, this could be the training difficulties that adversarial training is known for. Perhaps noise helps: http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants