This is a pytorch-version implementation of Proximal Policy Optimisation(PPO). In this code, the actions can also be sampled from the beta distribution which could improve the performance. The paper about this is: The Beta Policy for Continuous Control Reinforcement Learning
- python 3.5.2
- openai-gym
- mujoco-1.50.1.56
- pytorch-0.4.0
Install OpenAI Baselines (the openai-baselines update so quickly, please use the older version as blow, will solve in the future.)
# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
git checkout 366f486
pip install -e .
the --dist
contains gauss
and beta
.
python train_atari.py --lr-decay --cuda(if you have a GPU, you can add this flag)
python demo_atari.py
python train_mujoco.py --env-name='Walker2d-v2' --num-workers=1 --nsteps=2048 --clip=0.2 --batch-size=32 --epoch=10 --lr=3e-4 --ent-coef=0 --total-frames=1000000 --vloss-coef=1 --cuda (if you have gpu)
python demo_mujoco.py
Please download them from the Google Driver, then put the saved_models
under the current folder.
Note: the new-version openai-gym has problem in rendering, so I use the demo of Walker2d-v1
Tips: when you watch the demo, you can press TAB to switch the camera in the mujoco.