Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including Policy Gradient Techniques #7

Open
alborzgeramifard opened this issue Jul 18, 2015 · 6 comments
Open

Including Policy Gradient Techniques #7

alborzgeramifard opened this issue Jul 18, 2015 · 6 comments

Comments

@alborzgeramifard
Copy link

Pierre-Luc Bacon

The project description suggests that RLPy is mainly about value function based algorithms. However, I think it'd be nice to add Will Dabney's implementation of some of the popular policy gradient methods.
https://github.com/amarack/python-rl/blob/master/pyrl/agents/policy_gradient.py

Christoph Dann

We totally agree with you. This is definitely a near-future goal for RLPy. Which specific method you suggest to address first?
Btw: There is an implementation of Natural Actor Critic in RLPy, but unfortunately it is tested very little so far (c.f. the simple example in examples/gridworld/nac.py)

Pierre-Luc Bacon

I think that all of Will's code should be included !
Having an implementation of REINFORCE would also be a useful baseline.

@smcgregor
Copy link
Contributor

Any updates on Policy Gradient methods? I am considering implementing a human-interpretable policy class in rlpy and Policy Gradient would likely match my needs.

I know this is a mirror of the bitbucket repository, should we comment on the issue there? It is currently closed.

@alborzgeramifard
Copy link
Author

We don’t have plan for adding policy gradient techniques at the moment, but you should be able to expand the framework to support them.

Best,

Alborz Geramifard
Research Scientist | Amazon Echo
people.csail.mit.edu/agf

On Sep 30, 2015, at 4:43 PM, Sean McGregor notifications@github.com wrote:

Any updates on Policy Gradient methods? I am considering implementing a human-interpretable policy class in rlpy and Policy Gradient would likely match my needs.

I know this is a mirror of the bitbucket repository https://bitbucket.org/rlpy/rlpy/issues/25/including-policy-gradient-methods, should we comment on the issue there? It is currently closed.


Reply to this email directly or view it on GitHub #7 (comment).

@smcgregor
Copy link
Contributor

Thanks! If my implementation meets rlpy's quality threshold, would you like a pull request?

@alborzgeramifard
Copy link
Author

Yup.

Best,

Alborz Geramifard
Research Scientist | Amazon Echo
people.csail.mit.edu/agf

On Oct 1, 2015, at 12:02 PM, Sean McGregor notifications@github.com wrote:

Thanks! If my implementation meets rlpy's quality threshold, would you like a pull request?


Reply to this email directly or view it on GitHub #7 (comment).

@vladfi1
Copy link

vladfi1 commented Mar 18, 2016

@smcgregor Any updates on this? I'd like to do policy gradient in rlpy.

@smcgregor
Copy link
Contributor

@vladfi1 I think it is unlikely that I will be implementing this anytime soon. I've been running experiments that use probabilistic policies on top of RLPy, but we don't currently need RLPy to optimize the policy parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants