Skip to content

DartML/PPO-Stein-Control-Variate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO With Stein Control Variate

In this work, we propose a control variate method to effectively reduce variance for policy gradient methods motivated by Stein's identity.

This repository contains the code of the Proximal Policy Optimization(PPO) with Stein control variates for Mujoco environments.

The code is based on the excellent implementation of PPO.

Dependencies

Running Experiments

You can run following commands to reproduce our results:

cd optimization

# For MinVar optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po MinVar -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po MinVar -p 500 
 
python train.py Ant-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po MinVar -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po MinVar -p 500 


# For FitQ optimization
python train.py HalfCheetah-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Walker2d-v1 -b 10000 -ps large -po FitQ -p 500 
python train.py Hopper-v1 -b 10000 -ps large -po FitQ -p 500 

python train.py Ant-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py Humanoid-v1 -b 10000 -ps small -po FitQ -p 500 
python train.py HumanoidStandup-v1 -b 10000 -ps small -po FitQ -p 500


#For baseline PPO
python train.py HalfCheetah-v1 -b 10000 -ps large -c 0
python train.py Walker2d-v1 -b 10000 -ps large -c 0
python train.py Hopper-v1 -b 10000 -ps large -c 0

python train.py Ant-v1 -b 10000 -ps small -c 0
python train.py Humanoid-v1 -b 10000 -ps small -c 0
python train.py HumanoidStandup-v1 -b 10000 -ps small -c 0

The log files is in optimization/dartml_data. Further, we provide two shell scripts for tuning hyperparameters of stein control variates in the scripts folder.

For evaluation of PPO with/without Stein control variate, please see here.

Citations

If you find Stein control variates helpful, please cite following papers:

Sample-efficient Policy Optimization with Stein Control Variate. Hao Liu*, Yihao Feng*, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu (*: equal contribution). Preprint 2017

Feedbacks

If you have any questions about the code or the paper, please feel free to contact us.

About

Proximal Policy Optimization with Stein Control Variates:

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published