Skip to content

Latest commit

 

History

History

07-proximal-policy-optimization

Proximal Policy Optimization (PPO)

This is a pytorch-version implementation of Proximal Policy Optimisation(PPO). In this code, the actions can also be sampled from the beta distribution which could improve the performance. The paper about this is: The Beta Policy for Continuous Control Reinforcement Learning

Requirements

  • python 3.5.2
  • openai-gym
  • mujoco-1.50.1.56
  • pytorch-0.4.0

Installation

Install OpenAI Baselines (the openai-baselines update so quickly, please use the older version as blow, will solve in the future.)

# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
git checkout 366f486
pip install -e .

Instruction to run the code

the --dist contains gauss and beta.

Train the Network with Atari games:

python train_atari.py --lr-decay --cuda(if you have a GPU, you can add this flag)

Test the Network with Atari games

python demo_atari.py

Train the Network with Mujoco:

python train_mujoco.py --env-name='Walker2d-v2' --num-workers=1 --nsteps=2048 --clip=0.2 --batch-size=32 --epoch=10 --lr=3e-4 --ent-coef=0 --total-frames=1000000 --vloss-coef=1 --cuda (if you have gpu)

Test the Network with Mujoco

python demo_mujoco.py

Download the Pre-trained Model

Please download them from the Google Driver, then put the saved_models under the current folder.

Results

Training Performance

Training_Curve

Demo: Walker2d-v1

Note: the new-version openai-gym has problem in rendering, so I use the demo of Walker2d-v1
Tips: when you watch the demo, you can press TAB to switch the camera in the mujoco.
Demo