Distributed Proximal Policy Optimization (DPPO)

This is an pytorch-version implementation of Emergence of Locomotion Behaviours in Rich Environments. This project is based on Alexis David Jacq's DPPO project. However, it has been rewritten and contains some modifications that appaer to improve learning in some environments. In this code, I revised the Running Mean Filter and this leads to better performance (for example in Walker2D). I also rewrote the code to support the Actor Network and Critic Network separately. This change then allows the creation of asymmetric for some tasks, where the information available at training time is not available at run time. Further, the actions in this project are sampled from a Beta Distribution, leads to better training speed and performance in a large number of tasks.

Requirements

python 3.5.2
openai gym
mujoco-python
pytorch-0.3.1 (will update to the 0.4.1 version in August!)
pyro

Instruction to run the code

Train your models

cd /root-of-this-code/
python train_network.py

You could also try other mujoco's environments. This code has already pre-trained one mujoco environment: Walker2d-v1. You could try it by yourself on your favourite task!

Test your models:

cd /root-of-this-code/
python demo.py

Results

Training Curve

Demo: Walker2d-v1

Acknowledgement

Alexis David Jacq's DPPO

Reference

[1] Emergence of Locomotion Behaviours in Rich Environments

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
results		results
saved_models/Walker2d-v1		saved_models/Walker2d-v1
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
chief.py		chief.py
demo.py		demo.py
dppo_agent.py		dppo_agent.py
models.py		models.py
train_network.py		train_network.py
utils.py		utils.py

License

TianhongDai/distributed-ppo

Folders and files

Latest commit

History

Repository files navigation

Distributed Proximal Policy Optimization (DPPO)

Requirements

Instruction to run the code

Train your models

Test your models:

Results

Training Curve

Demo: Walker2d-v1

Acknowledgement

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages