PPO-PyTorch

Minimal PyTorch implementation of Proximal Policy Optimization with clipped objective for OpenAI gym environments.

Usage

To test a preTrained network : run test.py or test_continuous.py
To train a new network : run PPO.py or PPO_continuous.py
All the hyperparameters are in the PPO.py or PPO_continuous.py file
If you are trying to train it on a environment where action dimension = 1, make sure to check the tensor dimensions in the update function of PPO class, since I have used torch.squeeze() quite a few times. torch.squeeze() squeezes the tensor such that there are no dimensions of length = 1 (more info).
Number of actors for collecting experience = 1. This could be changed by creating multiple instances of ActorCritic networks in the PPO class and using them to collect experience (like A3C and standard PPO).

Trained and tested on:

Python 3.6
PyTorch 1.0
NumPy 1.15.3
gym 0.10.8
Pillow 5.3.0

PPO Discrete LunarLander-v2 (1200 episodes)	PPO Continuous BipedalWalker-v2 (4000 episodes)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.mypy_cache/3.7		.mypy_cache/3.7
.vscode		.vscode
dist_visual		dist_visual
figures		figures
gif		gif
preTrained		preTrained
.gitignore		.gitignore
PPO.py		PPO.py
PPO_continuous.py		PPO_continuous.py
PPOv2.py		PPOv2.py
README.md		README.md
gymtest.py		gymtest.py
log.txt		log.txt
test.py		test.py
test_continuous.py		test_continuous.py