Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the designHead weights in LstmPolicy and StateActionPredictor classes #23

Open
lihuang3 opened this issue Apr 7, 2018 · 0 comments

Comments

@lihuang3
Copy link

lihuang3 commented Apr 7, 2018

Hello, thanks for your great work!

I noticed that in /src/a3c.py line 271-277,
self.network = LSTMPolicy(env.observation_space.shape, numaction, designHead)
is defined within the scope "local", and
self.ap_network = StatePredictor(env.observation_space.shape, numaction, designHead, unsupType)
is defined within the scope "predictor" under the scope "local". I think (as I tested MNIST in a simple CNN) this indicates that the designHead weights used in both classes are different (even though designHead structures are the same) since they are under different scope.

In LstmPolicy class, the inputs are fed into the designHead and the outputs are fed into lstm for policy and value fcn prediction.
However in StatePredictor/StateActionPredictor class, the forward and inverse models are based on the designHead with different weights as I mentioned LstmPolicy and StatePredictor are within different scopes.

I was wondering here /src/a3c.py line 271-277, why LstmPolicy and StatePredictor are not under the same scope so their designHead would share weights. In other words, if they are using different weights, it seems that the forward and inverse models are trained regardless of the A3C policy and value function, while A3C policy/value fcn are affected by the forward loss as intrinsic reward.

Thank you,

Li

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant