Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The old and the new model is effectively the same? #60

Open
yuan1202 opened this issue May 23, 2019 · 0 comments
Open

The old and the new model is effectively the same? #60

yuan1202 opened this issue May 23, 2019 · 0 comments

Comments

@yuan1202
Copy link

Hi Simon

I am looking at your implementation of the PPO model.

After going through the code a couple of times I think in the implementation, although you created two policy instances, because of the re-use parameter is passed in the second instance, you effectively have the two identical policies in your model.

Furthermore I have not seen code that is used to transfer the weights between two policies, unlike OpenAI's implementation, in which they did this:
'''Python
assign_old_eq_new = U.function([],[], updates=[tf.assign(oldv, newv)
for (oldv, newv) in zipsame(oldpi.get_variables(), pi.get_variables())])
'''

Therefore could you please confirm this is indeed the case. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant