Skip to content

Notes and code implementations of examples and algorithms of the book Reinforcement Learning, 2nd Edition

License

Notifications You must be signed in to change notification settings

terrence-ou/Reinforcement-Learning-2nd-Edition-Notes-Codes

Repository files navigation

Reinforcement Learning 2nd Edition - Notes and Codes

Reinforcement Learning - An Introduction, 2nd Edition, written by Richard S. Sutton and Andrew G. Barto, is kind of bible of reinforcement learning. It is a required reading for students and researchers to get the appropriate context of the keep developing field of RL and AI.

Links to get or rent a hardcover or ebook: MIT Press, Amazon (Paperback version if generally not recommended because the poor printing quality).

Motivation of this project:

Although the authors have made the book extremely clear and friendly to readers at each level, this book is honestly still intimidating to RL or ML beginners because of the intense concepts, abstract examples and algorithms, and its volume. Therefore, as an RL researcher, I'm trying to extract key points and implement examples as well as exercises in the book to help more people better understand the valuable knowledge the book generously provides.

My work mainly consists of:

  • Turning examples into code and plots that are as close to that of in the book as possible;
  • Implementing algorithms in Python and testing them with RL playground packages like Gymnasium;
  • Take notes and organize them as PDF files per chapter.

Snapshot of chapters:

Chapter 2: Multi-armed Bandits     🔗 link

This chapter starts with bandit algorithm and introduces strategies like $\varepsilon$-greedy, Upper-Confidence-Bound, and Gradient Bandit to improve the the algorithm's performance.

  • A k-armed bandit testbed:

  • Parameter study (algorithm comparison) - stationary environment

Chapter 3: Finite Markov Decision Process     🔗 link

This chapter introduces the fundamentals of the Markov Decision Process in finite states like agent-environment interaction, goals and rewards, returns and episodes, and policy and value function. It helps to build up a basic understanding of the components of reinforcement learning.

  • Optimal solution to the gridworld example:

Chapter 4: Dynamic Programming     🔗 link

The dynamic programming (DP) methods introduced in this chapter includes policy iteration, which consists policy evaluation and policy improvement, and value iteration, which considered a concise and efficient version of policy iteration. The chapter puts up a topic that the evaluation and improvement process compete with each other but also cooperate to find the optimal value function and an optimal policy.

  • Jack's Car Rental example
  • Gambler's problem

Chapter 5: Monte Carlo Methods     🔗 link

Monte Carlo methods can be used to learn optimal behavior directly from interaction with the environment, with no model of the environment's dynamics. The chapter introduces on-policy MC methods like first-visit Monte Carlo prediction with/without Exploring Starts, and off-policy MC methods like ordinary/weighted importance sampling.

  • The infinite variance of ordinary importance sampling
  • Racetrack

Chapter 6: Temporal-Difference Learning     🔗 link

This chapter introduced temporal-difference (TD) learning, and showed how it can be applied to the reinforcement learning problem. The TD control methods are classified according to whether they deal with the complication by using and on-policy (SARSA, expected SARSA) or off-policy (Q-learning) approach. The chapter also discussed using double learning method to avoid maximization bias problem.

  • Comparison of TD(0) and MC on Random Walk environment

  • Interim and Asymptotic Performance of TD methods

Chapter 7: n-step Bootstrapping     🔗link

In progress

About

Notes and code implementations of examples and algorithms of the book Reinforcement Learning, 2nd Edition

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages