PPO With Stein Control Variate

In this work, we propose a control variate method to effectively reduce variance for policy gradient methods motivated by Stein's identity.

This repository contains the code of the Proximal Policy Optimization(PPO) with Stein control variates for Mujoco environments.

The code is based on the excellent implementation of PPO.

Dependencies

Python 3.5
MuJoCo
TensorFlow 1.3
Gym - Installation instructions.

Running Experiments

You can run following commands to reproduce our results:

cd optimization

# For MinVar optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po MinVar -p 500 
 
python train.py Ant-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po MinVar -p 500 


# For FitQ optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po FitQ -p 500 

python train.py Ant-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po FitQ -p 500


#For baseline PPO
python train.py HalfCheetah-v1 -b 10000 -ps large -c 0
python train.py Walker2d-v1 -b 10000 -ps large -c 0
python train.py Hopper-v1 -b 10000 -ps large -c 0

python train.py Ant-v1 -b 10000 -ps small -c 0
python train.py Humanoid-v1 -b 10000 -ps small -c 0
python train.py HumanoidStandup-v1 -b 10000 -ps small -c 0

The log files is in optimization/dartml_data. Further, we provide two shell scripts for tuning hyperparameters of stein control variates in the scripts folder.

For evaluation of PPO with/without Stein control variate, please see here.

Citations

If you find Stein control variates helpful, please cite following papers:

Sample-efficient Policy Optimization with Stein Control Variate. Hao Liu*, Yihao Feng*, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu (*: equal contribution). Preprint 2017

Feedbacks

If you have any questions about the code or the paper, please feel free to contact us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PPO With Stein Control Variate

Dependencies

Running Experiments

Citations

Feedbacks

Files

README.md

Latest commit

History

README.md

File metadata and controls

PPO With Stein Control Variate

Dependencies

Running Experiments

Citations

Feedbacks