Name		Name	Last commit message	Last commit date
parent directory ..
figures		figures
README.md		README.md
arguments.py		arguments.py
demo_atari.py		demo_atari.py
demo_mujoco.py		demo_mujoco.py
models.py		models.py
ppo_agent.py		ppo_agent.py
running_state.py		running_state.py
train_atari.py		train_atari.py
train_mujoco.py		train_mujoco.py
utils.py		utils.py

README.md

Proximal Policy Optimization (PPO)

This is a pytorch-version implementation of Proximal Policy Optimisation(PPO). In this code, the actions can also be sampled from the beta distribution which could improve the performance. The paper about this is: The Beta Policy for Continuous Control Reinforcement Learning

Requirements

python 3.5.2
openai-gym
mujoco-1.50.1.56
pytorch-0.4.0

Installation

Install OpenAI Baselines (the openai-baselines update so quickly, please use the older version as blow, will solve in the future.)

# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
git checkout 366f486
pip install -e .

Instruction to run the code

the --dist contains gauss and beta.

Train the Network with Atari games:

python train_atari.py --lr-decay --cuda(if you have a GPU, you can add this flag)

Test the Network with Atari games

python demo_atari.py

Train the Network with Mujoco:

python train_mujoco.py --env-name='Walker2d-v2' --num-workers=1 --nsteps=2048 --clip=0.2 --batch-size=32 --epoch=10 --lr=3e-4 --ent-coef=0 --total-frames=1000000 --vloss-coef=1 --cuda (if you have gpu)

Test the Network with Mujoco

python demo_mujoco.py

Download the Pre-trained Model

Please download them from the Google Driver, then put the saved_models under the current folder.

Results

Training Performance

Demo: Walker2d-v1

Note: the new-version openai-gym has problem in rendering, so I use the demo of Walker2d-v1
Tips: when you watch the demo, you can press TAB to switch the camera in the mujoco.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

07-proximal-policy-optimization

07-proximal-policy-optimization

figures

figures

README.md

README.md

arguments.py

arguments.py

demo_atari.py

demo_atari.py

demo_mujoco.py

demo_mujoco.py

models.py

models.py

ppo_agent.py

ppo_agent.py

running_state.py

running_state.py

train_atari.py

train_atari.py

train_mujoco.py

train_mujoco.py

utils.py

utils.py

README.md

Proximal Policy Optimization (PPO)

Requirements

Installation

Instruction to run the code

Train the Network with Atari games:

Test the Network with Atari games

Train the Network with Mujoco:

Test the Network with Mujoco

Download the Pre-trained Model

Results

Training Performance

Demo: Walker2d-v1

Files

07-proximal-policy-optimization

Directory actions

More options

Directory actions

More options

Latest commit

History

07-proximal-policy-optimization

Folders and files

parent directory

Proximal Policy Optimization (PPO)

Requirements

Installation

Instruction to run the code

Train the Network with Atari games:

Test the Network with Atari games

Train the Network with Mujoco:

Test the Network with Mujoco

Download the Pre-trained Model

Results

Training Performance

Demo: Walker2d-v1