New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Including Policy Gradient Techniques #7
Comments
Any updates on Policy Gradient methods? I am considering implementing a human-interpretable policy class in rlpy and Policy Gradient would likely match my needs. I know this is a mirror of the bitbucket repository, should we comment on the issue there? It is currently closed. |
We don’t have plan for adding policy gradient techniques at the moment, but you should be able to expand the framework to support them. Best,Alborz Geramifard
|
Thanks! If my implementation meets rlpy's quality threshold, would you like a pull request? |
Yup. Best,Alborz Geramifard
|
@smcgregor Any updates on this? I'd like to do policy gradient in rlpy. |
@vladfi1 I think it is unlikely that I will be implementing this anytime soon. I've been running experiments that use probabilistic policies on top of RLPy, but we don't currently need RLPy to optimize the policy parameters. |
Pierre-Luc Bacon
The project description suggests that RLPy is mainly about value function based algorithms. However, I think it'd be nice to add Will Dabney's implementation of some of the popular policy gradient methods.
https://github.com/amarack/python-rl/blob/master/pyrl/agents/policy_gradient.py
Christoph Dann
We totally agree with you. This is definitely a near-future goal for RLPy. Which specific method you suggest to address first?
Btw: There is an implementation of Natural Actor Critic in RLPy, but unfortunately it is tested very little so far (c.f. the simple example in examples/gridworld/nac.py)
Pierre-Luc Bacon
I think that all of Will's code should be included !
Having an implementation of REINFORCE would also be a useful baseline.
The text was updated successfully, but these errors were encountered: