Reinforcement-Learning-Notebooks

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

I wrote these notebooks in March 2017 while I took the COMP 767: Reinforcement Learning [5] class by Prof. Doina Precup at McGill, Montréal. I highly recommend you to go through the class notes and references of all the papers the intructors have posted on the website.

These notebooks should be used while you read the book and go beyond the same with the referenced papers. I would suggest watching David Silver's videos and reading the book simultaneously. And when you are done with a few chapters, start implementing them. The algorithms follow a pattern and mostly are variants of each other. I have tried my best to explain each notebook's results and possible future directions.

Disclaimer: The code is a little messy. I'd written this when I was not a Pythonista. If you would like to clean them up and want to make it into a nice interface, feel free to contact me. I will be very pleased to collaborate. If you use them then please cite the source and also mention the credits as listed below. Also, email me with ways to improve, let me know if you find any bugs.

Feel free to reach me at pulkit.khandelwal@mail.mcgill.ca or see my website here

Special Credits:

[1] Denny Britz

[2] Monica Patel

[3] Sutton and Barto

[4] David Silver

[5] Doina Precup's course

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Emphatic Temporal-Difference Learning		Emphatic Temporal-Difference Learning
Q(sigma) and multi-step bootstrapping methods		Q(sigma) and multi-step bootstrapping methods
Real Time Dynamic Programming		Real Time Dynamic Programming
TD Control methods - Expected SARSA		TD Control methods - Expected SARSA
Temporal-Difference Learning by Harm van Seijen		Temporal-Difference Learning by Harm van Seijen
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emphatic Temporal-Difference Learning

Emphatic Temporal-Difference Learning

Q(sigma) and multi-step bootstrapping methods

Q(sigma) and multi-step bootstrapping methods

Real Time Dynamic Programming

Real Time Dynamic Programming

TD Control methods - Expected SARSA

TD Control methods - Expected SARSA

Temporal-Difference Learning by Harm van Seijen

Temporal-Difference Learning by Harm van Seijen

README.md

README.md

Repository files navigation

Reinforcement-Learning-Notebooks

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

About

Releases

Packages

Languages

Pulkit-Khandelwal/Reinforcement-Learning-Notebooks

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-Learning-Notebooks

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

About

Resources

Stars

Watchers

Forks

Languages