Stop actor gradient flowing through the critic #8

danijar · 2016-07-31T10:41:47Z

I think you should use tf.stop_gradient() in https://github.com/coreylynch/async-rl/blob/master/a3c.py#L164. Otherwise, after some training the policy tends to use one action exclusively. Took me a while to figure this out in my own code, too.

The text was updated successfully, but these errors were encountered:

coreylynch · 2016-08-01T18:16:57Z

Oh interesting! I will definitely take a look at this. Thank you.

steveKapturowski · 2016-09-10T02:48:32Z

I noticed this as well and believe it's a significant cause of performance degradation. Additionally, you don't seem to be adding the entropy term to the objective which they mention in the paper as being useful for improving exploration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop actor gradient flowing through the critic #8

Stop actor gradient flowing through the critic #8

danijar commented Jul 31, 2016 •

edited

coreylynch commented Aug 1, 2016

steveKapturowski commented Sep 10, 2016

Stop actor gradient flowing through the critic #8

Stop actor gradient flowing through the critic #8

Comments

danijar commented Jul 31, 2016 • edited

coreylynch commented Aug 1, 2016

steveKapturowski commented Sep 10, 2016

danijar commented Jul 31, 2016 •

edited