Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop actor gradient flowing through the critic #8

Open
danijar opened this issue Jul 31, 2016 · 2 comments
Open

Stop actor gradient flowing through the critic #8

danijar opened this issue Jul 31, 2016 · 2 comments

Comments

@danijar
Copy link

danijar commented Jul 31, 2016

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

@coreylynch
Copy link
Owner

Oh interesting! I will definitely take a look at this. Thank you.

@steveKapturowski
Copy link

I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants