PyTorch implementation of PPO

NOTE: This is not maintained. I recommend using the implementation here. It is much more full featured and tested.

This is a PyTorch implementation of Proximal Policy Optimization.

This is code mostly ported from the OpenAI baselines implementation but currently does not optimize each batch for several epochs. I will add this soon.

Usage

python main.py --env-name Walker2d-v1

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Todo

Add multiple epochs per batch
Test results compared to baselines code

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE.md		LICENSE.md
PPO.html		PPO.html
README.md		README.md
main.py		main.py
models.py		models.py
replay_memory.py		replay_memory.py
running_state.py		running_state.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE.md

LICENSE.md

PPO.html

PPO.html

README.md

README.md

main.py

main.py

models.py

models.py

replay_memory.py

replay_memory.py

running_state.py

running_state.py

utils.py

utils.py

Repository files navigation

PyTorch implementation of PPO

Usage

Contributions

Todo

About

Releases

Packages

Languages

License

vedipen/pytorch-ppo-modified

Folders and files

Latest commit

History

Repository files navigation

PyTorch implementation of PPO

Usage

Contributions

Todo

About

Resources

License

Stars

Watchers

Forks

Languages