Behavior Proximal Policy Optimization

Author's Pytorch implementation of ICLR 2023 paper Behavior Proximal Policy Optimization (BPPO). BPPO uses the loss function from Proximal Policy Optimization (PPO) to improve the behavior policy estimated by behavior cloning.

The difference between BPPO and PPO

Compared to the loss function of PPO, BPPO does not introduce any extra constraint or regularization. The only difference is the advantage approximation, corresponding to the code difference between ppo.py line 88-89 and bppo.py line 151-155.

Overview of the Code

The code consists of 7 Python scripts and the file main.py contains various parameter settings which are interpreted and described in our paper.

Requirements

torch 1.12.0
mujoco 2.2.1
mujoco-py 2.1.2.14
d4rl 1.1

Running the code

python main.py: trains the network, storing checkpoints along the way.
Example:

python main.py --env hopper-medium-v2

Citation

If you use BPPO, please cite our paper as follows:

@article{zhuang2023behavior,
  title={Behavior proximal policy optimization},
  author={Zhuang, Zifeng and Lei, Kun and Liu, Jinxin and Wang, Donglin and Guo, Yilang},
  journal={arXiv preprint arXiv:2302.11312},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
LICENSE		LICENSE
README.md		README.md
bppo.py		bppo.py
buffer.py		buffer.py
critic.py		critic.py
main.py		main.py
net.py		net.py
ppo.py		ppo.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

bppo.py

bppo.py

buffer.py

buffer.py

critic.py

critic.py

main.py

main.py

net.py

net.py

ppo.py

ppo.py

utils.py

utils.py

Repository files navigation

Behavior Proximal Policy Optimization

The difference between BPPO and PPO

Overview of the Code

Requirements

Running the code

Citation

About

Releases

Packages

Contributors 2

Languages

License

Dragon-Zhuang/BPPO

Folders and files

Latest commit

History

Repository files navigation

Behavior Proximal Policy Optimization

The difference between BPPO and PPO

Overview of the Code

Requirements

Running the code

Citation

About

Resources

License

Stars

Watchers

Forks

Languages