Including Policy Gradient Techniques #7

alborzgeramifard · 2015-07-18T15:35:52Z

Pierre-Luc Bacon

The project description suggests that RLPy is mainly about value function based algorithms. However, I think it'd be nice to add Will Dabney's implementation of some of the popular policy gradient methods.
https://github.com/amarack/python-rl/blob/master/pyrl/agents/policy_gradient.py

Christoph Dann

We totally agree with you. This is definitely a near-future goal for RLPy. Which specific method you suggest to address first?
Btw: There is an implementation of Natural Actor Critic in RLPy, but unfortunately it is tested very little so far (c.f. the simple example in examples/gridworld/nac.py)

Pierre-Luc Bacon

I think that all of Will's code should be included !
Having an implementation of REINFORCE would also be a useful baseline.

smcgregor · 2015-09-30T20:43:31Z

Any updates on Policy Gradient methods? I am considering implementing a human-interpretable policy class in rlpy and Policy Gradient would likely match my needs.

I know this is a mirror of the bitbucket repository, should we comment on the issue there? It is currently closed.

alborzgeramifard · 2015-10-01T14:22:17Z

We don’t have plan for adding policy gradient techniques at the moment, but you should be able to expand the framework to support them.

Best,

Alborz Geramifard
Research Scientist | Amazon Echo
people.csail.mit.edu/agf

On Sep 30, 2015, at 4:43 PM, Sean McGregor notifications@github.com wrote:

Any updates on Policy Gradient methods? I am considering implementing a human-interpretable policy class in rlpy and Policy Gradient would likely match my needs.

I know this is a mirror of the bitbucket repository https://bitbucket.org/rlpy/rlpy/issues/25/including-policy-gradient-methods, should we comment on the issue there? It is currently closed.

—
Reply to this email directly or view it on GitHub #7 (comment).

smcgregor · 2015-10-01T16:02:11Z

Thanks! If my implementation meets rlpy's quality threshold, would you like a pull request?

alborzgeramifard · 2015-10-01T16:14:49Z

Yup.

Best,

Alborz Geramifard
Research Scientist | Amazon Echo
people.csail.mit.edu/agf

On Oct 1, 2015, at 12:02 PM, Sean McGregor notifications@github.com wrote:

Thanks! If my implementation meets rlpy's quality threshold, would you like a pull request?

—
Reply to this email directly or view it on GitHub #7 (comment).

vladfi1 · 2016-03-18T02:28:45Z

@smcgregor Any updates on this? I'd like to do policy gradient in rlpy.

smcgregor · 2016-03-19T18:25:38Z

@vladfi1 I think it is unlikely that I will be implementing this anytime soon. I've been running experiments that use probabilistic policies on top of RLPy, but we don't currently need RLPy to optimize the policy parameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including Policy Gradient Techniques #7

Including Policy Gradient Techniques #7

alborzgeramifard commented Jul 18, 2015

smcgregor commented Sep 30, 2015

alborzgeramifard commented Oct 1, 2015

smcgregor commented Oct 1, 2015

alborzgeramifard commented Oct 1, 2015

vladfi1 commented Mar 18, 2016

smcgregor commented Mar 19, 2016

Including Policy Gradient Techniques #7

Including Policy Gradient Techniques #7

Comments

alborzgeramifard commented Jul 18, 2015

Pierre-Luc Bacon

Christoph Dann

Pierre-Luc Bacon

smcgregor commented Sep 30, 2015

alborzgeramifard commented Oct 1, 2015

Best,

smcgregor commented Oct 1, 2015

alborzgeramifard commented Oct 1, 2015

Best,

vladfi1 commented Mar 18, 2016

smcgregor commented Mar 19, 2016