Source code for the paper: Truly Proximal Policy Optmization. The original code was forked from OpenAI baselines.
Method is tested on MuJoCo continuous control tasks and Atari discrete game tasks in OpenAI gym. Networks are trained using tensorflow1.10 and Python 3.6.
git clone --recursive https://github.com/wangyuhuix/TrulyPPO
cd TrulyPPO
pip install -r requirements.txt
- env: environment ID
- seed: random seed
- num_timesteps: number of timesteps
python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=InvertedPendulum-v2 --seed=0
You can try --alg=pporb
for PPO-RB and --alg-trppo
for TR-PPO.
python -m baselines.ppo2_AdaClip.run --alg=trulyppo --env=BeamRiderNoFrameskip-v4 --seed=0 --isatari