Skip to content

Dragon-Zhuang/BPPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Behavior Proximal Policy Optimization

Author's Pytorch implementation of ICLR 2023 paper Behavior Proximal Policy Optimization (BPPO). BPPO uses the loss function from Proximal Policy Optimization (PPO) to improve the behavior policy estimated by behavior cloning.

The difference between BPPO and PPO

Compared to the loss function of PPO, BPPO does not introduce any extra constraint or regularization. The only difference is the advantage approximation, corresponding to the code difference between ppo.py line 88-89 and bppo.py line 151-155.

Overview of the Code

The code consists of 7 Python scripts and the file main.py contains various parameter settings which are interpreted and described in our paper.

Requirements

  • torch 1.12.0
  • mujoco 2.2.1
  • mujoco-py 2.1.2.14
  • d4rl 1.1

Running the code

  • python main.py: trains the network, storing checkpoints along the way.
  • Example:
python main.py --env hopper-medium-v2

Citation

If you use BPPO, please cite our paper as follows:

@article{zhuang2023behavior,
  title={Behavior proximal policy optimization},
  author={Zhuang, Zifeng and Lei, Kun and Liu, Jinxin and Wang, Donglin and Guo, Yilang},
  journal={arXiv preprint arXiv:2302.11312},
  year={2023}
}

About

Author's Pytorch implementation of ICLR2023 paper Behavior Proximal Policy Optimization (BPPO).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages