Deep Reinforcement Learning Alogrithms

This repository will implement the classic deep reinforcement learning algorithms. The aim of this repository is to provide clear code for people to learn the deep reinforcement learning algorithm. In the future, more algorithms will be added and the existing codes will also be maintained.

Deep Q-Learning Network(DQN)
Double DQN(DDQN)
Dueling Network Architecture(Dueling DQN)
Deep Deterministic Policy Gradient(DDPG)
Advantage Actor-Critic(A2C)
Trust Region Policy Optimization(TRPO)
Proximal Policy Optimization(PPO)
Actor Critic using Kronecker-Factored Trust Region(ACKTR)

Update Information

2018-10-17 - In this update, most of algorithms have been imporved and add more experiments with plots (except for DPPG). The PPO now supports atari-games and mujoco-env. The TRPO is much stable and can have better results!

TODO List

add prioritized experience replay.
in the future, we will not use openai baseline's pre-processing functions.
improve the DDPG.

Requirements

python-3.5.2
openai-gym
mujoco-py-1.50.1.56
pytorch-0.4.0
openai-baselines

Installation

install the pytorch

plase go to official webisite to install it: https://pytorch.org/

Recommend use Anaconda Virtual Environment to manage your packages

install openai-baselines (the openai-baselines update so quickly, please use the older version as blow, will solve in the future.)

# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
git checkout 366f486
pip install -e .

Instructions

select the suitable algorithms

cd <the-rl-algorithm>

all of the parameters are defined in the arguments.py, you can train your model with suitable hyper-parameters.
train the networks

python train_network.py --env-name=<env-name> --cuda (only TRPO not support GPU) --<other-flags>

test the networks

python demo.py --env-name=<env-name>

download the pre-trained models
Please download them from the Google Driver, then put the saved_models under the corresponding algorithm's folder.

Performance of the algorithms

Deep Q Network (DQN)

Double DQN

Dueling Network

Advantage Actor Critic (A2C)

Trust Region Policy Optimization (TRPO)

Proximal Policy Optimization (PPO)

Acknowledgement:

Papers Related to the Deep Reinforcement Learning

[1] A Brief Survey of Deep Reinforcement Learning
[2] The Beta Policy for Continuous Control Reinforcement Learning
[3] Playing Atari with Deep Reinforcement Learning
[4] Deep Reinforcement Learning with Double Q-learning
[5] Dueling Network Architectures for Deep Reinforcement Learning
[6] Continuous control with deep reinforcement learning
[7] Continuous Deep Q-Learning with Model-based Acceleration
[8] Asynchronous Methods for Deep Reinforcement Learning
[9] Trust Region Policy Optimization
[10] Proximal Policy Optimization Algorithms
[11] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
01-deep-q-network		01-deep-q-network
02-double-dqn		02-double-dqn
03-dueling-network		03-dueling-network
04-deep-deterministic-policy-gradient		04-deep-deterministic-policy-gradient
05-advantage-actor-critic		05-advantage-actor-critic
06-trust-region-policy-optimization		06-trust-region-policy-optimization
07-proximal-policy-optimization		07-proximal-policy-optimization
figures		figures
LICENSE		LICENSE
README.md		README.md

License

tondji/reinforcement-learning-algorithms

Folders and files

Latest commit

History

Repository files navigation