Adding TRPO #435

Jackory · 2023-11-30T03:12:35Z

Description

TRPO is a representative algorithm of policy gradient in reinforcement learning. Although it is no longer practical, its ideas and mathematical principles are still worth considering. Currently, I haven't seen a single-file implementation of TRPO. I'm here to implement a single-file version of TRPO to help beginners understand it.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel · 2023-11-30T03:12:39Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 6, 2023 1:06pm

vwxyzjn · 2023-12-18T15:35:22Z

Hi this is some cool stuff! Feel free to run some benchmarks with mujoco to see how it performs.

Jackory added 2 commits November 30, 2023 10:43

Add a implementation of trpo_continous_action

9bb4216

autoflake

42a469a

vercel bot deployed to Preview November 30, 2023 03:13 View deployment

Jackory mentioned this pull request Nov 30, 2023

Adding TRPO implementation #245

Closed

Merge branch 'vwxyzjn:master' into master

2b513f2

vercel bot deployed to Preview December 6, 2023 13:06 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding TRPO #435

Adding TRPO #435

Jackory commented Nov 30, 2023 •

edited

vercel bot commented Nov 30, 2023 •

edited

vwxyzjn commented Dec 18, 2023

Adding TRPO #435

Are you sure you want to change the base?

Adding TRPO #435

Conversation

Jackory commented Nov 30, 2023 • edited

Description

Types of changes

Checklist:

vercel bot commented Nov 30, 2023 • edited

vwxyzjn commented Dec 18, 2023

Jackory commented Nov 30, 2023 •

edited

vercel bot commented Nov 30, 2023 •

edited