Deep Deterministic Policy Gradient on PyTorch |
Overview ====== The is the implementation of Deep Deterministic Policy Gradient (DDPG) using PyTorch. Part of the utilities functions such as replay buffer and random process are from keras-rl repo. Contributes are very welcome.
Dependencies ====== * Python 3.4 * PyTorch 0.1.9 * OpenAI Gym
Training : results of two environment and their training curves:
- Pendulum-v0
$ ./main.py --debug
- MountainCarContinuous-v0
$ ./main.py --env MountainCarContinuous-v0 --validate_episodes 100 --max_episode_length 2500 --ou_sigma 0.5 --debug
- Testing :
$ ./main.py --mode test --debug
Add batch normalization