Reinforcement Learning

This repository is a ongoing culmination of my efforts to understand the various algorithms used to train reinforcement learning (RL) agents to perform various tasks. It is meant to be gentle guide for others who may wish to explore the world of RL as well. Following the content in the order listed below should be the fastest way to get up to speed. Once done, see here for a medium scale project which combines several rl concepts to solve an interesting problem.

Content

1. Bandits

This section uses the k-armed bandit problem discussed in [1] to introduced several important RL concepts. I recreate some of the experiments to show that I can obtain similar results and solve some of the exercises. Concepts covered include:

Value functions
Epoch-greedy action selection
Balancing exploration and exploitation
Upper confidence bounds
Stationary vs non-stationary problems

2. Temporal Difference

This section covers temporal difference methods for RL and demonstrates their performance using examples and exercises from [1].

SARSA
Q-Learning

3. CartPole

This section introduces some of the new state of the art techniques for solving RL problems. To compare between them, I've opted to use the CartPole enviroment for its simplicity (from OpenAI's Gym).

Q-Learning with a neural network
Deep Q-Networks
- Experience Replay
- Target Networks
Double Deep Q-Networks
Prioritized Experience Replay

Dependencies

Python 3.5
numpy
matplotlib
pandas
OpenAI Gym
CNTK**

** I've chosen to implement the more complex algorithms using CNTK because very few of such implementations exist and it would force me to understand the little details.

Resources

Textbooks:

[1] Reinforcement Learning: An Introduction by R. Sutton and A. Barto

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
atari		atari
bandits		bandits
cartpole		cartpole
temporal difference		temporal difference
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atari

atari

bandits

bandits

cartpole

cartpole

temporal difference

temporal difference

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Reinforcement Learning

Content

1. Bandits

2. Temporal Difference

3. CartPole

Dependencies

Resources

Textbooks:

Papers:

Articles

About

Releases

Packages

Languages

License

frankibem/reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning

Content

1. Bandits

2. Temporal Difference

3. CartPole

Dependencies

Resources

Textbooks:

Papers:

Articles

About

Topics

Resources

License

Stars

Watchers

Forks

Languages