Reinforcement Learning Resources

Books

Reinforcement Learning - An Introduction, Second Edition, Richard Sutton, Andrew Barto, 2020
Reinforcement Learning and Optimal Control, Dimitri Bertsekas, 2019 (Draft)
Deep Reinforcement Learning, Aske Plaat, 2023
Reinforcement Learning and Stochastic Optimization, Warren Powel, 2019
Optimal Learning, Warren Powel, Ilya Ryzhov, 2018
Reinforcement Learning - State of the Art, Marco Wiering, Martijn van Otterlo, 2012
The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning, RY Rubinstein, DP Kroese, 2004

Articles and tutorials

Structure in Reinforcement Learning: A Survey and Open Problems, Aditya Mohan et al, Institute of Artificial Intelligence, Leibniz Uni, 2023
An Introduction to Deep Reinforcement Learning, Vincent Francois-Lavet et al, McGill, 2018
Discovering faster matrix multiplication algorithms with reinforcement learning, Fawzi A., 2021, Nature
Solving the Rubik cube with Deep Reinforcement Learning and search, Agostinelli et al, Nature MI, 2019
Faster sorting algorithms discovered using deep reinforcement learning, Mankowitz et al, 2023, Nature
On Actor-Critic Algorithms, Vijay Konda, John Tsitsiklis, MIT, SIAM, 2003
Proximal Policy Optimization Algorithms, John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, OpenAI, 2017
Learning to Predict by the Method of Temporal Differences, R. Sutton, 1988
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Ronald J. Williams, Northeastern U,, 1992
An Analysis of Actor/Critic Algorithms using Eligibility Traces: Reinforcement Learning with Imperfect Value Functions, H. Kimura et al, Tokyo Institute of Technology, 1998
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Richard Sutton et al, AT&T, 1999
Gradient Descent for General Reinforcement Learning, Leemon Baird and Andrew Moore, 1999
Playing Atari with Deep Reinforcement Learning, V. Mnih, 2013
Continuous Control With Deep Reinforcement Learning, T. Lillicrap, et al, 2015
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games, Heinrich, UCL, 2016
A Definition of Continual Reinforcement Learning, David Abel et al, DeepMind, 2023
A Model of Mood as Integrated Advantage, Bennett, D, Davidson, G., Niv Y, 2023
Emotions as Computations, Emanuel, A., Eldar, E., 2022
Mood as Representation of Momentum, Eldar, E. et al, 2016
When Should The Agents Explore?, Pislar et al, 2022
Planning with Diffusion for Flexible Behavior Synthesis, Michael Janner, Yliun Du, Joshua Tennenbaum, Sergey Levine, 2022
On Neural Consolidation For Transfer In Reinforcement Learning, Valentin Guillet et al, Universite de Toulouse, 2022
What Can Learned Intrinsic Rewards Capture?, Zheng at al, 2020
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems, Levine S. et al, 2020
Fast Computation of Nash Equilibria in Imperfect Information games, Munos et al, 2020
Expression Non-Markov Reward to a Markov Agent, Abel et al, 2020
Making Sense of Reinforcement Learning and Probabilistic Inference, O'Donoghue et al, ICLR, 2020
A Bayesian Approach to Robust Reinforcement Learning, Derman, Derman et al, 2019
Foolproof Cooperative Learning Jacq, Perolat et al, 2019
World Discovery Models, Azar et al, 2019
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Imporvement, Barreto et al, 2019
Optimizing Agent Behavior Over Long Time Scales By Transporting Value, Hung, Lillicrap, 2019
Statistics and Samples in Distributional Reinforcement Learning, Rowland et al, 2019
The Termination Critic, Harutyunyan, 2019
Policy Optimization with Linear Temporal Logic Constraints, C. Voloshin et al, Caltech, 2022
Generative Adversarial Self-Imitation Learning, Guo, Oh, et al, 2018
Deep Reinforcement Learning and the Deadly Triad, von Hasselt et al, 2018
Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications, T. Nguen et al, 2019
DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding, Z. Wang et al, AAAI, 2021
Fast deep reinforcement learning using online adjustments from the past, Hansen et al, 2018
A Generalised Method for Empirical Game Theoretic Analysis, K. Tuyls, J. Perolat, et al, 2018
Deep Reinforcement Learning from Sels-Play in Imperfect-Information Games, Heinrich, Silver, 2016
Gradient Estimation Using Stochastic Computation Graphs, J. Schulman, N. Heess, 2016
Understanding Sampling-Based Adversarial Search Methods, Ramanujan, PhD Thesis, 2012
Reinforcement Learning and Simulation-Based Search in Computer Go, David Silver, PhD Thesis 2009
Dynamic Programming and Markov Processes, Ronald Howard, 1960
Deep Reinforcement Learning slides, David Silver, DeepMind, 2015
Deep Reinforcement Learning slides, Pieter Abbeel, UC Berkeley (NIPS 2016)
Deep Reinforcement Learning slides, Pieter Abbeel, UC Berkeley (August 23th, 2016)
Reinforcement Learning and Optimal Control A Selective Overview, Dimitri Bertsekas
Emergence of Locomotion Behaviours in Rich Environments, Nicolas Heess, David Silver, et al, DeepMind, 2017
Simple Embodied Language Learning as a Byproduct of Meta-reinforcement Learning, Liu 2023

...More Reinforcement Learning articles on DeepMind site

Partially Observable Markov Processes articles

A Survey of Partially Observable Markov Desicion Processes: theory, models and algorithms, Monahan G., 2014
POMDP Solution Methods, D. Braziunas, University of Toronto, 2003
Optimal Control of Markov Processes with Incomplete State Information, Astrom, JMAA, 1965
Dynamic Programming Conditions for Partially Observable Stochastic Systems, M.H.A. Davis et al, 1971
The Optimal Control of Partially Observable Markov Processes over a Finite Horizon, R. Smallwood, E. Sondik, 1971
The Optimal Control of Partially Observable markov Processes over the Infinite Horizon: Discounted Costs, Sondik, ORSA, 1973
Optimal Control For Partially Observable Markov Decision Processes Over an Infinite Horizon, Sawaki, 1978
Semi-Markov Decision Processes with Incomplete State Observation, Discounted Cost Criterion, Kazuyoshi Wakuta, 1982

Multi-Agent Reinforcement Learning (MARL) articles

A Deep Reinforcement Learning For Multi Agent Systems: A Review of Challenges, Solutions and Applications, Nguyen, 2019
A Review of Cooperative Multi-Agent Deep Reinforcement Learning, Oroojlooy et al, SAS Institute, 2021
Cooperative Multi-Agent learning for Navigation via Structured State Abstraction, Abdel-Aziz et al, 2023
Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning, Johanson, Hughes, Timbers, Leibo, 2022
Simple Embodied Language as Byproduct of Meta-Reinforcement Learning, Liu, Suri, Zhou, Finn, 2023
RODE: Learning Roles To Decompose Multi Agent Tasks, Wang, Gupta, Mahajan, Bei Peng, 2020
The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games, Yu et al, Tsinghua U., 2022
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation, Tseng, MIT-CSAIL, 2022

Reinforcement Learning in Supply Chain Management

introductory material to relevant algorithms by Wouter van Heeswijk

Why Hasn’t Reinforcement Learning Conquered The World (Yet)?, Wouter van Heeswijk, Medium

The Four Policy Classes of Reinforcement Learning, Wouter van Heeswijk, Medium

Policy Gradients In Reinforcement Learning Explained, Wouter van Heeswijk, Medium

Proximal Policy Optimization (PPO) Explained, Wouter van Heeswijk, Medium

Dynamic Pricing with Contextual Bandits: Learning by Doing, Massimiliano Costacurta, Medium

A Unified Framework for Stochastic Optimization, Warren B. Powell, Princeton, 2017

Tutorial on Stochastic Optimization in Energy II: An energy storage illustration, Warren B. Powell, 2015

Challenges of Real World Reinforcement Learning, Gabriel Dulac-Arnold, 2019

From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions, Warren B. Powell, Princeton, 2019

Sequential Decision Analytics for the Truckload Industry, Warren B. Powell, Optimal Dynamics, 2022

Stochastic Optimization, James C. Spall, John Hopkins U., 2012

How to Improve your Supply Chain with Deep Reinforcement Learning with Christian Hubbs, Medium

Deep reinforcement learning for supply chain and price optimization, Ilya Katsov, 2020, blog

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization, Oroojlooyjadid et al, 2020

A Deep Reinforcement Learning Approach to Supply Chain Inventory Management, Francesco Stranieri, 2022

Optimization of Apparel Supply Chain Using Deep Reinforcement Learning, JW. Chong et al, IEEE, 2022

Reinforcement learning for supply chain optimization, L. Kemmer et al, 2018

Deep Reinforcement Learning hands-on for Optimized Ad Placement with NandaKishore Joshi

related repo: ad placement example

link to the book Reinforcement Learning in Action

Online Algorithms and solving them with Reinforcement Learning

The k-server problem: Researchers Refute a Widespread Belief About Online Algorithms, Quanta Magazine, 2023

The Randomized k-Server Conjecture Is False!, S. Bubeck et al, 2023

The Online K-Server Problem, Aris Floratos, Ravi Boppana, Courant Institute, NUY

Decision Transformers - Reinforcement Learning via Sequence Modeling

Decision Transformer: Reinforcement Learning via Sequence Modeling, Lily Chen et al, UC Berkeley, 2021

Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained), Yannick Kilcher, 2022, youtube video

Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling, 2021, youtube video

Decision transformer: Reinforcement Learning via Sequence Modeling, Youseff Fathi CS 885: Reinforcement Learning, U. of Waterloo, 2022

Online Decision Transformer, Q. Zheng et al, 2022

Offline Reinforcement Learning as one Big Sequence Modeling Problem, M. Janner et al, 2021

Reinforcement Learning Upside Down: Don't Predict Rewards - Just Map Them To Actions, Juergen Schmidhuber, Tech Report, 2020

Training Agents using Upside Down Reinforcement Learning, R. Srivastava et al, 2021

RvS: What is Essential for Offline RL via Supervised Learning, Scott Emmons et al, 2022

Reinforcement Learning in Large Language Models and related algorithms

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, A. Ahmadian et al, 2024

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, Yevgen Chebotar et al, DeepMind, 2023

related repo: https://qtransformer.github.io/

Basics of Reinforcement Learning for LLMs with Cameron Wolfe, medium

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, Evan Hubinger et al, Anthropic, 2024

Reinforcement Learning from Human Feedback (RLHF)

Deep Reinforcement Learnng from Human Preferences, Paul Christiano et al, OpenAI, 2017

Training Language Models to Follow Instructions With Human Feedback, L. Ouyang et al, OpenAI, 2022

Fine Tuning Language Models from Human Preferences, Daniel M. Ziegler et al, OpenAI, 2020

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Y. Bai et al, Anthropic, 2022

Learning to Summarize from Human Feedback, Nisan Stiennon et al, OpenAI, 2022

Illustrating Reinforcement Learning from Human Feedback (RLHF), Hugging Face article, 2022, Nathan Lambert, Louis Castricato, Leandro von Werra , Alex Havrilla

Learning from human preferences, Dario Amodei, OpenAI blog, 2017

Reinforcement Learning fro Human Feedback, Wikipedia

Human-like Reasoning via Reinforcement Learning and Representation Learning

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, Eric Zelikman et al, Stanford U., 2024
STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning, E. Zelikman et al, 2022

Adaptive Reinforcement Learning, RL applied to Bayesian Networks

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning, B. Huang et al, CMU, 2022

AdaRL repo: https://github.com/Adaptive-RL/AdaRL-code

Swarm-based Reinforcement Learning and its applications in Robotics

Deep Reinforcement Learning for Swarm Systems, Maximilian Hüttenrauch et al, U. of Lincoln, 2019

Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution, Zahi Kakish et al, ASU, 2021

Reinforcement learning for swarm robotics: An overview of applications, algorithms and simulators, MA Blaise, Moulay A. Akhlouf, 2023

Maximum diffusion Reinforcement Learning

Maximum diffusion reinforcement learning, Thomas Beurreta et al, Northwestern U., 2023

Deep Reinforcement Learning for Physical Applications

Predicting disruptive instabilities in controlled fusion plasmas through deep learning, Julian Kates-Harbeck et al, Harvard U, 2019

Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning, Jonas Degrave, 2021

First Nuclear Plasma Control with Digital Twin, Sabine Hossenfelder, Feb 2024, youtube video

Online tutorials and short readings

OpenAI resources:

Spinning Up

https://spinningup.openai.com/en/latest/user/introduction.html

DeepMind resources:

Deep Reinforcement Learning blog

https://www.deepmind.com/blog/deep-reinforcement-learning
DeepMind Reinforcement Learning articles:

https://www.deepmind.com/research?tag=Reinforcement+learning

Computational Neuroscience Lab's resources:

Demystifying Deep Reinforcement Learning

https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

Richard Sutton's online posts

The Bitter Lesson, Richard Sutton, 2019

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

Andrej Karpathy's blog

Deep Reinforcement learning: Pong From Pixels

http://karpathy.github.io/2016/05/31/rl/

Medium

The Fundamentals of Reinforcement Learning with Ruben Winastwan

Markov Decision Process, Policy, Value Function, Bellman Equation, Dynamic Programming Implementation

https://towardsdatascience.com/the-fundamentals-of-reinforcement-learning-177dd8626042
Reinforcement Learning: Markov Decision Process (Part 1) with Tan Alvin

https://pub.towardsai.net/reinforcement-learning-markov-decision-process-part-1-e376991cafbe
Reinforcement Learning: Dynamic Programming and Monte Carlo (Part 2) with Tan Alvin

https://pub.towardsai.net/reinforcement-learning-dynamic-programming-and-monte-carlo-part-2-190c1d86532c
Reinforcement Learning: SARSA and Q Learning (Part 3) with Tan Alvin

https://pub.towardsai.net/reinforcement-learning-sarsa-and-q-learning-part-3-871bedbeaec0
Multi-Armed Bandits with Steve Roberts (Part 1): Mathematical Framework and Terminology

https://towardsdatascience.com/multi-armed-bandits-part-1-b8d33ab80697
Multi-Armed Bandits with Steve Roberts (Part 2): The Bandit Framework

https://towardsdatascience.com/multi-armed-bandits-part-2-5834cb7aba4b
Multi-Armed Bandits with Steve Roberts (Part 3): Bandit Algorithms

https://towardsdatascience.com/bandit-algorithms-34fd7890cb18
Multi-Armed Bandits with Steve Roberts (Part 4): The Upper Confidence Bound Bandit Algorithm

https://towardsdatascience.com/the-upper-confidence-bound-ucb-bandit-algorithm-c05c2bf4c13f
Multi-Armed Bandits with Steve Roberts (Part 5): Thompson Sampling

https://towardsdatascience.com/thompson-sampling-fc28817eacb8
Multi-Armed Bandits with Steve Roberts (Part 6): A Comparison of Bandit Algorithms

https://towardsdatascience.com/a-comparison-of-bandit-algorithms-24b4adfcabb
An Introduction to Reinforcement Learning with Steve Roberts (Part 1): State Values and Policy Evaluation

https://towardsdatascience.com/state-values-and-policy-evaluation-ceefdd8c2369
An Introduction to Reinforcement Learning with Steve Roberts (Part 2): Markov Decision Processes and Bellman Equations

https://towardsdatascience.com/markov-decision-processes-and-bellman-equations-45234cce9d25
An Introduction to Reinforcement Learning with Steve Roberts (Part 3): Policy and Value Iteration

https://towardsdatascience.com/policy-and-value-iteration-78501afb41d2
Introduction to Reinforcement Learning with Markel Ausin (Part 1): Multi-armed bandit problem

https://markelsanz14.medium.com/introduction-to-reinforcement-learning-part-1-multi-armed-bandit-problem-618e8cbf9d4b
Introduction to Reinforcement Learning with Markel Ausin (Part 2): Q-Learning

https://markelsanz14.medium.com/introduction-to-reinforcement-learning-part-2-q-learning-4d93f9f37e3e
Introduction to Reinforcement Learning with Markel Ausin (Part 3): Q-Learning with Neural Networks, Algorithm DQN

https://markelsanz14.medium.com/introduction-to-reinforcement-learning-part-3-q-learning-with-neural-networks-algorithm-dqn-1e22ee928ecd
Introduction to Reinforcement Learning with Markel Ausin (Part 4): Double DQN and Dueling DQN

https://markelsanz14.medium.com/introduction-to-reinforcement-learning-part-4-double-dqn-and-dueling-dqn-b349c9a61ea1
Introduction to Reinforcement Learning with Markel Ausin (Part 5): Policy Gradient

https://markelsanz14.medium.com/introduction-to-reinforcement-learning-part-5-policy-gradient-algorithms-862960f7b0dc
Introduction to Reinforcement Learning with Sagi Shaier (Part 1): The four main subelements of a reinforcement learning system

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-1-introduction-c0d55c1240a3
Introduction to Reinforcement Learning with Sagi Shaier (Part 2): Multi-arm bandits

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-2-multi-arm-bandits-be5efb2e83ea
Introduction to Reinforcement Learning with Sagi Shaier (Part 3): Finite Markov Decision Processes

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-3-finite-markov-decision-processes-51e1f8d3ddb7
Introduction to Reinforcement Learning with Sagi Shaier (Part 4): Dynamic Programming

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-4-dynamic-programming-6af57e575b3d
Introduction to Reinforcement Learning with Sagi Shaier (Part 5): Monte Carlo Methods

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-5-monte-carlo-methods-25067003bb0f
Introduction to Reinforcement Learning with Sagi Shaier (Part 6): Temporal Difference (TD) Learning

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-6-temporal-difference-td-learning-2a12f0aba9f9
Introduction to Reinforcement Learning with Sagi Shaier (Part 7): N-step Bootstraping

https://towardsdatascience.com/introduction-to-reinforcement-learning-rl-part-7-n-step-bootstrapping-6c3006a13265
Reinforcement Learning with Dan Lee (Part 1): A Brief Introduction

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-1-a-brief-introduction-a53a849771cf
Reinforcement Learning with Dan Lee (Part 2): Introducing Markov Process

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-2-introducing-markov-process-d3586d4003e0
Reinforcement Learning with Dan Lee (Part 3): The Markov Decision Process

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-3-the-markov-decision-process-9f5066e073a2
Reinforcement Learning with Dan Lee (Part 4): Optimal Policy Search with MDP

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-4-optimal-policy-search-with-mdp-7fc96158ea8a
Reinforcement Learning with Dan Lee (Part 5): Monte-Carlo and Temporal-Difference Learning

https://medium.com/@Adline125/reinforcement-learning-part-5-monte-carlo-and-temporal-difference-learning-889053aba07d
Reinforcement Learning with Dan Lee (Part 6): TD(\lambda) and Q-learning

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-6-td-%CE%BB-q-learning-99cdfdf4e76a
Reinforcement Learning with Dan Lee (Part 7): A Brief Introduction to Deep Q Networks

https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-7-a-brief-introduction-to-deep-q-networks-aa45314a2ae
Simple Reinforcement Learning with Tensorflow (Part 0), Arthur Juliani: Q-Learning with Tables and Neural Networks

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0
Simple Reinforcement Learning in Tensorflow (Part 1), Arthur Juliani: Two-armed bandit

https://awjuliani.medium.com/super-simple-reinforcement-learning-tutorial-part-1-fd544fab149
Simple Reinforcement Learning with Tensorflow (Part 1.5), Arthur Juliani: Contextual bandits

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c
Simple Reinforcement Learning with Tensorflow (Part 2), Arthur Juliani: Policy-based agents

https://awjuliani.medium.com/super-simple-reinforcement-learning-tutorial-part-2-ded33892c724
Simple Reinforcement Learning with Tensorflow (Part 3), Arthur Juliani: Model-based RL

https://awjuliani.medium.com/simple-reinforcement-learning-with-tensorflow-part-3-model-based-rl-9a6fe0cce99
Simple Reinforcement Learning with Tensorflow (Part 4), Arthur Juliani: Deep Q Networks and beyond

https://awjuliani.medium.com/simple-reinforcement-learning-with-tensorflow-part-4-deep-q-networks-and-beyond-8438a3e2b8df
Simple Reinforcement Learning with Tensorflow (Part 5), Arthur Juliani: Visualizing an Agent's thoughts and actions

https://awjuliani.medium.com/simple-reinforcement-learning-with-tensorflow-part-5-visualizing-an-agents-thoughts-and-actions-4f27b134bb2a#.kdgfgy7k8
Simple Reinforcement Learning with Tensorflow (Part 6), Arthur Juliani: Partial Observability and Deep Recurrent Q-Networks

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-6-partial-observability-and-deep-recurrent-q-68463e9aeefc#.gi4xdq8pk
Simple Reinforcement Learning with Tensorflow (Part 7), Arthur Juliani: Action-Selection Strategies for exploration

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-7-action-selection-strategies-for-exploration-d3a97b7cceaf
Simple Reinforcement Learning with Tensorflow (Part 8), Arthur Juliani: Asynchronous Actor-Critic Agents (A3C)

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2#.hg13tn9zw
Q Learning Algorithm: How to Successfully Teach an Intelligent Agent to Play A Game with Daul Dobilas

https://towardsdatascience.com/q-learning-algorithm-how-to-successfully-teach-an-intelligent-agent-to-play-a-game-933595fd1abf
Reinforcement Learning with Sthanikam Santhosh (Part 1): Deep Q Learning using Tensorflow2

https://medium.com/@sthanikamsanthosh1994/deep-q-learning-using-tensorflow2-a5eabc1a8d82
Reinforcement Learning with Sthanikam Santhosh (Part 2): Policy Gradient (Reinforce) using Tensorflow2

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-2-policy-gradient-reinforce-using-tensorflow2-a386a11e1dc6
Reinforcement Learning with Sthanikam Santhosh (Part 3): Dueling DQN using Tensorflow2

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-3-dueling-dqn-using-tensorflow-45b024c5b7d9
Reinforcement Learning with Sthanikam Santhosh (Part 4): Dueling Double Deep Q Learning

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-4-dueling-double-deep-q-learning-with-tensorflow-3f46e65fb644
Reinforcement Learning with Sthanikam Santhosh (Part 5): Soft Actor-Critic (SAC) Network

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-5-soft-actor-critic-sac-network-using-tensorflow2-697917b4b752
Reinforcement Learning with Sthanikam Santhosh (Part 6): Deep Deterministic Policy Gradient (DDPG) using Tensorflow2

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-6-deep-deterministic-policy-gradient-ddpg-using-tesorflow2-fcdccf8f1172
Reinforcement Learning with Sthanikam Santhosh (Part 7): Twin Delayed Deep Deterministic Policy Gradient (TD3) in Tensorflow2

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-7-twin-delayed-deep-deterministic-policy-gradient-td3-in-tensorflow2-726fb9a53ae6
Reinforcement Learning with Sthanikam Santhosh (Part 8): Proximal Policy Optimization (PPO) for trading environment (Tensorflow)

https://medium.com/@sthanikamsanthosh1994/reinforcement-learning-part-8-proximal-policy-optimization-ppo-for-trading-9f1c3431f27d
Foundational RL with Rahul Bhadani: Value Iteration and Policy Iteration

https://medium.com/mlearning-ai/foundational-rl-value-iteration-and-policy-iteration-76251e47581b
Foundational RL with Rahul Bhadani: Dynamic Programming

https://towardsdatascience.com/foundational-rl-dynamic-programming-28f96f6fb40e
Foundational RL with Rahul Bhadani: Solving Markov Decision Processes

https://towardsdatascience.com/foundational-rl-solving-markov-decision-process-d90b7e134c0b
Cross-entropy method for Reinforcement Learning with Avishree Khare

https://towardsdatascience.com/cross-entropy-method-for-reinforcement-learning-2b6de2a4f3a0
Temporal Difference Learning in Reinforcement Learning

https://medium.com/nerd-for-tech/temporal-difference-learning-in-reinforcement-learning-cf13ed159fcb

https://medium.com/@violante.andre/simple-reinforcement-learning-temporal-difference-learning-e883ea0d65b0
Dynamic Pricing with Reinforcement Learning from Scratch: Q-Learning with Nicolo Albanese

https://towardsdatascience.com/dynamic-pricing-with-reinforcement-learning-from-scratch-q-learning-fb3fb764da49
Reinforcement Learning: Q-Learning

A step-by-step guide to implementing the Q-Learning algorithm using OpenAI Gym for Taxi-V3

https://towardsdev.com/reinforcement-learning-q-learning-38146880ca49
Q-Learning: Utilizing Reinforcement learning algorithm to trace optimal path

https://medium.com/@amos.eda/q-learning-utilising-reinforcement-learning-algorithm-to-trace-optimal-path-aa443e307443
A/B Optimization with Policy Gradient Reinforcement Learning

A step by step visual explanation of the Policy Gradient method

https://towardsdatascience.com/a-b-optimization-with-policy-gradient-reinforcement-learning-b4a3527f849
Making Sense of the Bias / Variance trade-off in (Deep) Reinforcement Learning

What goes into a stable accurate reinforcement signal?

https://blog.mlreview.com/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565
Monte Carlo Methods for Reinforcement Learning with Shivam Mohan

Introduction

https://medium.com/nerd-for-tech/monte-carlo-methods-for-reinforcement-learning-d30d874dd817
n-step Bootrapping in Reinforcement Learning with Shivam Mohan

Introduction

https://medium.com/@shivamohan07/n-step-bootstrapping-in-reinforcement-learning-fa87cbd0584a
A brief overview of Eligibility Traces in Reinforcement Learning with Shivam Mohan

https://medium.com/nerd-for-tech/a-brief-overview-of-eligibility-traces-in-reinforcement-learning-c0a8326fa9f7
Temporal Difference Learning in Reinforcement Learning with Shivam Mohan

Introduction

https://medium.com/nerd-for-tech/temporal-difference-learning-in-reinforcement-learning-cf13ed159fcb
Reinforcement Learning - Generalization In Continuous State Space

https://towardsdatascience.com/reinforcement-learning-generalisation-in-continuous-state-space-df943b04ebfa

Function Approximation with Random Walk Example
Q vs V in Reinforcement Learning, the Easy Way

https://zsalloum.medium.com/q-vs-v-in-reinforcement-learning-the-easy-way-9350e1523031
6 Reinforcement Learning Algorithms Explained

https://towardsdatascience.com/6-reinforcement-learning-algorithms-explained-237a79dbd8e
Gambler's Problem - When inaction is in fact optimal

https://borundev.medium.com/gamblers-problem-when-inaction-is-infact-optimal-1d8348b69c4f
Win at Blackjack with Reinforcement Learning

https://medium.com/the-power-of-ai/blackjack-with-reinforcement-learning-95f588dd670c
Learn to Win Games with Monte Carlo Reinforcement Learning

https://medium.com/the-power-of-ai/monte-carlo-reinforcement-learning-for-simple-games-71dc8f4ffda4
Build your own unbeatable TicTacToe with Reinforcement Learning

https://medium.com/@artem.a.arutyunov/build-your-own-unbeatable-tictactoe-ai-with-reinforcement-learning-411502c54c22
The Actor-Critic Reinforcement Learning Algorithm with Dhanoop Karunakaran

https://medium.com/intro-to-artificial-intelligence/the-actor-critic-reinforcement-learning-algorithm-c8095a655c14

related link: The idea behind Actor-Critic, Sergios Karagiannakoson, blog, 2018

related link: Going Deeper Into Reinforcement Learning: Fundamentals of Policy Gradients, Seita's place (blog), 2017

related link: Actor-Critic Algorithms, Sergey Levine, CS 294-112, 2017
Implement Policy Iteration In Python - A Minimal Working Example with Wouter van Heeswijk

https://towardsdatascience.com/implement-policy-iteration-in-python-a-minimal-working-example-6bf6cc156ca9
Implement Value Iteration In Python - A Minimal Working Example with Wouter van Heeswijk

https://towardsdatascience.com/implement-value-iteration-in-python-a-minimal-working-example-f638907f3437
A Minimal Working Example for Discrete Policy Gradients in TensorFlow 2.0 with Wouter van Heeswijk

https://towardsdatascience.com/a-minimal-working-example-for-discrete-policy-gradients-in-tensorflow-2-0-d6a0d6b1a6d7

related paper: link
The Five Building Blocks of Markov Decision Processes with Wouter van Heeswijk

https://towardsdatascience.com/the-five-building-blocks-of-markov-decision-processes-997dc1ab48a7
Walking Off The Cliff with Off-Policy Reinforcement Learning with Wouter van Heeswijk

https://towardsdatascience.com/walking-off-the-cliff-with-off-policy-reinforcement-learning-7fdbcdfe31ff
Trust Region Policy Optimization (TRPO) Explained with Wouter van Heeswijk

https://towardsdatascience.com/trust-region-policy-optimization-trpo-explained-4b56bd206fc2
Policy Gradients in Reinforcement Learning Explained with Wouter van Heeswijk

https://towardsdatascience.com/policy-gradients-in-reinforcement-learning-explained-ecec7df94245
A Deep Dive into Problem States with Wouter van Heeswijk

https://towardsdatascience.com/a-deep-dive-into-problem-states-498ad0746c98
Deep Deterministic Policy Gradients Explained with Wouter van Heeswijk

https://towardsdatascience.com/deep-deterministic-policy-gradients-explained-4643c1f71b2e
Solving The Taxi Environment With Q-Learning - A Tutorial with Wouter van Heeswijk

https://towardsdatascience.com/solving-the-taxi-environment-with-q-learning-a-tutorial-c76c22fc5d8f
When Stochastic Policies Are Better Than Deterministic Ones with Wouter van Heeswijk

https://towardsdatascience.com/when-stochastic-policies-are-better-than-deterministic-ones-b950cd0d60f4
Common Reinforcement Learning Algorithms (And How To Fix Them) with Wouter van Heeswijk

https://towardsdatascience.com/three-fundamental-flaws-in-common-reinforcement-learning-algorithms-and-how-to-fix-them-951160b7a207
Seven Exploration Strategies In Reinforcement Learning You Should Know with Wouter van Heeswijk

https://towardsdatascience.com/seven-exploration-strategies-in-reinforcement-learning-you-should-know-8eca7dec503b
Why Reinforcement Learning Does Not Need Bellman's Equation with Wouter van Heeswijk

https://towardsdatascience.com/why-reinforcement-learning-doesnt-need-bellman-s-equation-c9c2e51a0b7
The Alberta Plan: Sutton's Research Vision for Artifical Intelligence with Wouter van Heeswijk

https://towardsdatascience.com/the-alberta-plan-suttons-research-vision-for-artificial-intelligence-a1763088da04
The Four Policy Classes of Reinforcement Learning with Wouter van Heeswijk

https://towardsdatascience.com/the-four-policy-classes-of-reinforcement-learning-38185daa6c8a

related papers:

Unified Framework for Stochastic Optimization, Warren Powel, Princeton, 2017

Tutorial on Stochastic Optimization in Energy II: An energy storage illustration, Warren Powel, 2015
Natural Policy Gradients In Reinforcement Learning Explained with Wouter van Heeswijk

https://towardsdatascience.com/natural-policy-gradients-in-reinforcement-learning-explained-2265864cf43c

related literature: Why Natural Gradient? S. Amari, S.C. Douglas, 1998

Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems, L. Nurbekyan et al

New Insights and Perspectives on the Natural Gradient Method, James Martens, DeepMind, 2020

Natural Gradient Methods: Perspectives, Efficient-Scalabale Appropximations, and Analysis, R. Shreshta, 2023

Natural Gradient Descent with Agustinus Kristiadi

Trust Region Policy Optimization, OpenAI
Understand Policy Gradient by Building Cross Entropy from Scratch

https://towardsdatascience.com/understand-policy-gradient-by-building-cross-entropy-from-scratch-75ca18b53e94
A Deep Dive into Reinforcement Learning: Q-Learning and Deep Q-Learning on a 10x10 FrozenLake Environment with Nandan Grover

https://medium.com/mlearning-ai/a-deep-dive-into-reinforcement-learning-q-learning-and-deep-q-learning-on-a-10x10-frozenlake-c76d56810a46

example code: https://github.com/nandangrover/reinforcement_frozenlake
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 1)

Apply offline-lambda on Random Walk

code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

https://towardsdatascience.com/reinforcement-learning-td-%CE%BB-introduction-686a5e4f4e60
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 2)

TD(lambda) with eligibility trace

code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

https://meatba11.medium.com/reinforcement-learning-td-%CE%BB-introduction-2-f0ea427cd395
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 3)

Extend TD(lambda) on Q function with Sarsa(lambda)

code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction

https://towardsdatascience.com/reinforcement-learning-td-%CE%BB-introduction-3-f329bdbf872a
Reinforcement Learning - Generalization of Continuing Tasks

Server Access Example implementation

https://towardsdatascience.com/reinforcement-learning-generalisation-on-continuing-tasks-ffb9a89d57d0
The exploration-exploitation tradeoff: intuitions and strategies with Joseph Rocca

Understanding e-greedy, optimistic initialization, UCB and Thompson sampling strategies

https://towardsdatascience.com/the-exploration-exploitation-dilemma-f5622fbe1e82
RL - Exploration with Deep Learning with Jonathan Hui

https://jonathan-hui.medium.com/rl-exploration-9d7c15c5bf79
RL — Tips on Reinforcement Learning with Jonathan Hui

https://jonathan-hui.medium.com/rl-tips-on-reinforcement-learning-fbd121111775
Deep Reinforcement Learning - Deep Deterministic Policy Gradient (DDPG) algorithm with Markus Buchholz

https://markus-x-buchholz.medium.com/deep-reinforcement-learning-deep-deterministic-policy-gradient-ddpg-algoritm-5a823da91b43
Breaking down DeepMind's AlphaTensor

https://pub.towardsai.net/breaking-down-deepminds-alphatensor-15534303cde2
Batched Bandit Problems with Sean Smith

Multi-armed Bandits with delayed rewards in successive trials

https://towardsdatascience.com/batched-bandit-problems-ea73dba5da7a
A Lesson on Applied Reinforcement Learning in Production with Bill Zhu

https://medium.com/@zheqing.zhu/a-lesson-on-applied-reinforcement-learning-in-production-2390011994b3
Deep Reinforcement Learning for Network Design in Marine Transportation with Timothe Boulet

https://medium.com/instadeep/deep-reinforcement-learning-for-network-design-in-marine-transportation-c3e31a1aba0
The Emotional Lives of RL Agents

https://awjuliani.medium.com/the-emotional-lives-of-rl-agents-12e2c8ee36af
Curious Agents: An Introduction with Dries Smith

https://medium.com/@dries.epos/curious-agents-ebfee02ef024

(code of this series: https://github.com/DriesSmit/CuriousAgents)
Curious Agents II: Solving MountainCar without Rewards

https://medium.com/@dries.epos/curious-agents-ii-solving-mountaincar-without-rewards-c49ae2177819

(code of this series: https://github.com/DriesSmit/CuriousAgents)
Curious Agents III: BYOL-Explore

https://medium.com/@dries.epos/curious-agents-iii-byol-explore-93f34fa6146a

(code of this series: https://github.com/DriesSmit/CuriousAgents)
Curious Agents IV: BYOL-Hindsight

https://medium.com/@dries.epos/curious-agents-iv-byol-hindsight-318c559175f0

(code of this series: https://github.com/DriesSmit/CuriousAgents)
Understanding the World Through Action: RL as a Foundation for Scalable Self-Supervised Learning with Sergey Levine

https://medium.com/@sergey.levine/understanding-the-world-through-action-rl-as-a-foundation-for-scalable-self-supervised-learning-636e4e243001
How Robots Can Learn End-to-End from Data with Sergey Levine

https://medium.com/@sergey.levine/how-robots-can-learn-end-to-end-from-data-3d879b0a2ba1
Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning with Sergey Levine

https://medium.com/@sergey.levine/decisions-from-data-how-offline-reinforcement-learning-will-change-how-we-use-ml-24d98cb069b0
An Ecological Perspective on Reinforcement Learning

https://medium.com/@sergey.levine/an-ecological-perspective-on-reinforcement-learning-de697f3d6516
Function Approximation in Reinforcement Learning

https://towardsdatascience.com/function-approximation-in-reinforcement-learning-85a4864d566
A Gentle Introduction to Deep Reinforcement Learning with Jordi Torres

https://towardsdatascience.com/drl-01-a-gentle-introduction-to-deep-reinforcement-learning-405b79866bf4

(Github repo: here)
Formalization of a Reinforcement Learning Problem with Jordi Torres

https://towardsdatascience.com/drl-02-formalization-of-a-reinforcement-learning-problem-108b52ebfd9a

(Github repo: here)
Deep Learning Basics: Basic Concepts for Beginners with Jordi Torres

https://towardsdatascience.com/deep-learning-basics-1d26923cc24a

(Github repo: here)
Deep Learning with PyTorch: First contact with PyTorch for beginners with Jordi Torres

https://towardsdatascience.com/deep-learning-with-pytorch-a93b09bdae96

(Github repo: here)
PyTorch Performance Analysis with TensorBoad: How to run TensorBoard for PyTorch inside Colab

https://towardsdatascience.com/pytorch-performance-analysis-with-tensorboard-7c61f91071aa

(Github repo: here)
Solving a Reinforcement Learning Problem Using Cross-Entropy Method: Agent Creation Using Deep Neural Networks with Jordi Torres

https://towardsdatascience.com/solving-a-reinforcement-learning-problem-using-cross-entropy-method-23d9726a737

(Github repo: here)
Cross-Entropy Method Performance Analysis: Implementation of Cross Entropy Training Loop with Jordi Torres

https://towardsdatascience.com/cross-entropy-method-performance-analysis-161a5faef5fc

(Github repo: here)
The Bellman Equation: V-function and Q-function explained with Jordi Torres

https://towardsdatascience.com/the-bellman-equation-59258a0d3fa7

(Github repo: here)
The Value Iteration Algorithm: Estimation of Transitions and Rewards from the Agent's Experience with Jordi Torres

https://torres.ai/deep-reinforcement-learning-explained-series/

(Github repo: here)
Value Iteration for V-function: V-function in Practice for Frozen-Lake Environment with Jordi Torres

https://towardsdatascience.com/value-iteration-for-v-function-d7bcccc1ec24

(Github repo: here)
Value Iteration for Q-function: Frozen Lake code for Q function with Jordi Torres

https://towardsdatascience.com/value-iteration-for-q-function-ac9e508d85bd

(Github repo: here)
Reviewing Essential Concepts: Mathematical Notation Updated with Jordi Torres

https://towardsdatascience.com/reviewing-essential-concepts-from-part-1-e28234ee7f4f

(Github repo: here)
Monte Carlo methods: Exploration-Exploitation Dilemma with Jordi Torres

https://towardsdatascience.com/monte-carlo-methods-9b289f030c2e

(Github repo: here)
MC Control and Temporal Difference Methods: Constant-a MC Control, Sarsa, Q-Learning with Jordi Torres

https://towardsdatascience.com/mc-control-methods-50c018271553

(Github repo: here)
Deep Q-Network (DQN)-I: OpenAI Gym Pong and Wrappers with Jordi Torres

https://towardsdatascience.com/deep-q-network-dqn-i-bce08bdf2af

(Github repo: here)
Deep Q-Network (DQN)-II: Experience Replay and Target Networks with Jordi Torres

https://towardsdatascience.com/deep-q-network-dqn-ii-b6bf911b6b2c

(Github repo: here)
Deep Q-Netowrk (DQN)-III: Performance and Use with Jordi Torres

https://towardsdatascience.com/deep-q-network-dqn-iii-c5a83b0338d2

(Github repo: here)
Policy-Based Methods: Hill Climbing Algorithm with Jordi Torres

https://towardsdatascience.com/policy-based-methods-8ae60927a78d

(Github repo: here)
Policy-Gradient Methods: REINFORCE algorithm with Jordi Torres

https://towardsdatascience.com/policy-gradient-methods-104c783251e0

(Github repo: here)
Model Compression in Reinforcement Learning with Kartik G (Part 1)

https://medium.com/@kartikganapathi/model-compression-in-reinforcement-learning-part-1-91970a84a24a
Model Compression in Reinforcement Learning with Kartik G (Part 2)

https://medium.com/@kartikganapathi/model-compression-in-reinforcement-learning-part-2-8e57269c8386
Reinforcement Learning Frameworks: Solving CartPole Environment using RLib on Ray framework with Jordi Torres

https://towardsdatascience.com/reinforcement-learning-frameworks-e349de4f645a

(Github repo: here)
Deep Laplacian-based Options for Temporally-Extended Exploration with Marlos Machado

https://medium.com/@marlos.cholodovskis/deep-laplacian-based-options-for-temporally-extended-exploration-7bf8dd469838

links to the papers - here and here.
RL - Explorationg with Deep Learning with Jonathan Hui

https://jonathan-hui.medium.com/rl-exploration-9d7c15c5bf79
Multi-Agent Reinforcement Learning (MARL) algorithms with Mehul Gupta

https://medium.com/data-science-in-your-pocket/multi-agent-reinforcement-learning-marl-algorithms-4156f2a0d448
Multi-agent Reinforcement Learning Paper Reading - RODE: Learning Roles to Decompose Multi-agent Tasks with Christian Lin

https://medium.com/@crlc112358/multi-agent-reinforcement-learning-paper-reading-rode-learning-roles-to-decompose-multi-agent-31e54f196425

link to the paper here
Multi-agent Reinforcement Learning Paper Reading - The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games with Christian Lin

https://medium.com/@crlc112358/multi-agent-reinforcement-learning-paper-reading-trust-region-policy-optimization-inmulti-agent-f52e55a1c060

link to the papers here and here
Multi-agent Reinforcement Learning paper Reading - Offline Reinforcement Learning with Knowledge Distillation with Christopher Lin

https://medium.com/@crlc112358/multi-agent-reinforcement-learning-paper-reading-offline-multi-agent-reinforcement-learning-with-6718f3cbf6fc

link to the paper here
Reinforcement Learning/RL with diffusion - I with Ayush Mangal

https://ayushtues.medium.com/rl-with-diffusion-i-d64c6e96d5ed

link to the paper here

link to web page here

link to github repo here

colab link here
Can Reinforcement Learning Generalize Beyond Its Training with John Morrow

https://towardsdatascience.com/can-reinforcement-learning-generalize-beyond-its-training-3b9012d8e4cf

link to the paper here
Dynamic Model Selection using Reinforcement Learning with Monimoy Purkayastha

https://medium.com/@juniper.cto.aiml.2021/dynamic-model-selection-using-reinforcement-learning-e872453cb97b

related papers:

Adaptive Model Selection Network: Application to Airline Pricing, Shukla et al, 2019

Introduction to Multi-Armed Bandits, A. Slivkins, Microsoft Reserach, 2022

Model Selection for Contextual Bandits, D.J. Foster et al, MIT, Microsoft Research, 2019

Online and Scalable Model Selection with Contextual Bandits, Xie et al, 2021
Reinforcement Learning in the Warehousing Industry with Chris Mahoney

https://ai.plainenglish.io/reinforcement-learning-in-the-warehousing-industry-a5e7f1c28422

related papers:

A Reinforcement Learning Approach for a Decision Support System for Logistics Network, M. Rabe, F. Dross, 2015

Reinforcement Learning Approach to Porduct Allocation and Storage, M. Andra M.S. Thesis
How to apply reinforcement learning to order-pick routing in warehouses (including Python code) with SMLC

https://ai.plainenglish.io/how-to-apply-reinforcement-learning-to-order-pick-routing-in-warehouses-including-python-code-e9f208e53350
Evolving Reinforcement Learning Agents Using Genetic Algorithms with Mohamed Abdin

https://levelup.gitconnected.com/evolving-reinforcement-learning-agents-using-genetic-algorithms-409e213562a5

related paper: Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning, Felipe Such et al, 2018

related github repo: https://github.com/mohdabdin/Evolving-RL-Agents
Understanding Zero-Shot Learning — Making ML More Human with Ekin Tiu

https://towardsdatascience.com/understanding-zero-shot-learning-making-ml-more-human-4653ac35ccab

related paper: Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles, Kathy Jang et al, UC Berkeley, 2019

related paper: Learning Transferable Visual Models From Natural Language Supervision, Alec Radford et al, OpenAI, 2021

related paper: Zero-Shot Learning and its Applications from Autonomous Vehicles to COVID-19 Diagnosis: A Review, M. Rezaei et al, U Leeds, 2020
Policy Based Reinforcement Learning — A Detailed Study, Part 1, with NandaKishore Joshi

https://nandakishorej8.medium.com/part-1-policy-based-reinforcement-learning-a-detailed-study-1d4e9b8b5239

link to book: here
Policy Based Reinforcement Learning — OpenAI’s Cartpole with REINFORCE algorithm, Part 2, with NandaKishore Joshi

https://nandakishorej8.medium.com/part-2-policy-based-reinforcement-learning-openais-cartpole-with-reinforce-algorithm-18de8cb5efa4
RLAIF: Reinforcement Learning from AI Feedback with Cameron R. Wolfe, Jan, 2024

https://towardsdatascience.com/rlaif-reinforcement-learning-from-ai-feedback-d7dbdae8f093

related paper: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Harrison Lee et al, 2023

related paper: Constitutional AI: Harmlessness from AI Feedback, Y. Bai, 2022

related paper: PaLM: Scaling Language Modeling with Pathways, A. Chowdhery et al, 2022

related paper: PaLM 2 Technical Report, Google, 2023

related paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al, Google Research, 2022

related paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al, Google Research, ICLR 2023

related paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al, Anthropic, 2022

related paper: A General Language Assistant as a Laboratory for Alignment, A. Askell et al, Anthropic, 2021

related paper: Learning to summarize from human feedback, N. Stiennon et al, OpenAI, 2022

Python-based tools, techniques and design pattersn for Reinforcement Learning projects

Learning with Stable-Baselines3: Reinforcement learning without the boilerplate code

https://towardsdatascience.com/convenient-reinforcement-learning-with-stable-baselines3-dccf466b7585
Quickly Generate Combinatorial State Spaces in Python with Wouter van Heeswijk

https://medium.com/codex/quickly-generate-combinatorial-state-spaces-in-python-c53decab2bdd
Quickly Generate Combinatorial Action Spaces in Python with Wouter van Heeswijk

https://medium.com/codex/quickly-generate-combinatorial-action-spaces-15962118e508
Speed Up Your Simulations By Deploying These Sampling Strategies with Wouter van Heeswijk

https://towardsdatascience.com/speed-up-your-simulations-by-deploying-these-sampling-strategies-372993703ec5
Reinforcement Learning for Combinatorial Optimization with Or Rivlin

https://towardsdatascience.com/reinforcement-learning-for-combinatorial-optimization-d1402e396e91

related repo: Minimum vertex cover with Deep Reinforcement Learning

related paper: Learning Heuristics over Large Graphs via Deep Reinforcement Learning, S. Manchanda et al, IIT-Deli, 2020
Inverse Reinforcement Learning with Jonathan Hui

https://jonathan-hui.medium.com/rl-inverse-reinforcement-learning-56c739acfb5a

PyLessons online tutorials using OpenAI Gym environment

Introduction to Reinforcement Learning: CartPole game
Solving the Cartpole with Double Deep Q Network
Solving the Cartpole with Dueling Double Deep Q Network
Epsilon Greedy in Deep Q Learning
D3QN Agent with Prioritized Experience Replay
DQN PER with Convolutional Neural Networks
A.I. learns to play Pong with Deep Q Network
Introduction to Reinforcement Learning Policy Gradient
Introduction to Advantage Actor-Critic method (A2C)
Asynchronous Advantage Actor-Critic (A3C) algorithm
Policy Optimization (PPO)
LunarLander-v2 with Proximal Policy Optimization
BipedalWalker-v3 with Continuous Proximal Policy Optimization

mlq.ai resources:

Fundamentals of Reinforcement Learning: Estimating the Action-Value Function by Peter Foy, 2020

StudyWolf's resources:

Reinforcement Learning Part 1: Q-Learning and exploration

https://studywolf.wordpress.com/2012/11/25/reinforcement-learning-q-learning-and-exploration/
Reinforcement Learning Part 2: Sarsa versus Q learning

https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/

example code: https://github.com/studywolf/blog/tree/master/RL/SARSA%20vs%20Qlearn%20cliff

ccm suite code: https://github.com/tcstewar/ccmsuite
Reinforcement Learning Part 3: Egocentric learning

https://studywolf.wordpress.com/2015/03/29/reinforcement-learning-part-3-egocentric-learning/

example code: https://github.com/studywolf/blog/tree/master/RL/Egocentric
Reinforcement Learning Part 4: Combining Egocentric and Allocentric

https://studywolf.wordpress.com/2015/04/09/reinforcement-learning-part-4-combining-egocentric-and-allocentric/

example code: https://github.com/studywolf/blog/tree/master/RL/combination%20allo%20and%20ego
Deep Learning for control using augmented hessian-free optimization

https://studywolf.wordpress.com/2016/04/04/deep-learning-for-control-using-augmented-hessian-free-optimization/

example code: https://github.com/studywolf/blog/blob/master/train_AHF/train_hf.py

Jeff Bradberry's blog

Introductuction to Monte Carlo Tree Search

online lecture videos

How ChatGPT is Trained with Ari Seff, (February 2023)
Reinforcement Learning from Human Feedback: Progress and Challenges with John Schulman (Berkeley EECS Colloquium, April 19, 2023)
Control as Inference and Soft Deep RL with Sergey Levine (NIPS 2018)
Tutorial: Introduction to Reinforcement Learning with Function Approximation (NIPS Tutorials 2015)
Deep Reinforcement Learning: John Schulman, OpenAI, Berkeley (MLSS Cadiz, 2016)

Lecture 1

Lecture 2

Lecture 3

Lecture 4
Deep Reinforcement Learning, David Silver, Dept. of Computer Science, University College, (July 2015, London) Note: lecture slides included under section "Articles and tutorials"
Reinforcement Learning: From Basic Concepts to Deep Q Networks (Deep Learning Summer School McGill 2016)
Deep Reinforcement Learning, Pieter Abbeel, Dept. of Electrical Engineering and Computer Sciences, UC Berkeley, August 2016 Note: lecture slides included under section "Articles and tutorials"
Reinforcement Learning, Emma Brunskill, Stanford CS234, Winter 2019
Decision Transformer: Reinforcement Learning via Sequence Modeling (Research Paper Explained), Yannick Kilcher, 2022, youtube video
Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling, 2021, youtube video
RLHF: Reinforcement Learning from Human Feedback with Ms Aerin

related paper: Training language models to follow instructions with human feedback, Ouyang et al, OpenAI, 2022

related python code: https://github.com/lucidrains/PaLM-rlhf-pytorch/tree/main

Game Theory Resources

Books

Theory of Games and Statistical Decisions, David Blackwell, 1954
Algorithmic Game Theory, Tim Roughgarden, 2007
N-Person Game Theory: Concepts and Applications by A. Rapoport, 1970
Mathematical Foundations of Game Theory, Rida Laraki, Jerome Renault, Sylvain Sorin, 2010
Game Theory: Decisions, Interactions and Evolution, James N. Webb, 2000

Articles

Equilibrium Points in N-Person Games, John Nash, 1950
Notes on the N-Person Game - II: The Value of N-Person Game, L.S. Shapley, 1951
A Theory of Individual Choice Behavior, R. Duncan Luce, Columbia U., 1957
The Hide-and-Seek Game of von Neumann, Merill Flood, 1968
N-Person Game Theory, L.S. Shapley, 1968
The Expected Outcome Model of Two-Player Games, Bruce Abramson, Columbia U, 1987
Paradoxical Behaviour of Mechanical and Electrical Networks, J. Cohen, P. Horowitz, Harvard U, 1991
Braess' Paradox in a Loss Network, N.G. Bean et al, 1995
Statistical Mechanics of Systems with heterogenous agents: Minority Games, D. Challet et al, 1999
Cooperative Games: Core and Shapley Values, R. Serano, Brown U., 2007
Student of Games: A unified learning algorithm for both perfect and imperfect information games, Martin Schmid et al, DeepMind, 2023

online lecture videos

Algorithmic Game Theory, Tim Roughgarden, Stanford CS364A, Fall 2013
Game Theory Through the Computational Lens, Tim Roughgarden, LSE Events

Online tutorials and short readins

Medium

NashPy: Strategic Interactions in Python

https://medium.com/@agbonorino/nashpy-strategic-interactions-in-python-aac937c916a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReinforcementLearningAndGameTheoryResources.md

ReinforcementLearningAndGameTheoryResources.md

Reinforcement Learning Resources

Books

Articles and tutorials

Partially Observable Markov Processes articles

Multi-Agent Reinforcement Learning (MARL) articles

Reinforcement Learning in Supply Chain Management

introductory material to relevant algorithms by Wouter van Heeswijk

Online Algorithms and solving them with Reinforcement Learning

Decision Transformers - Reinforcement Learning via Sequence Modeling

Reinforcement Learning in Large Language Models and related algorithms

Reinforcement Learning from Human Feedback (RLHF)

Human-like Reasoning via Reinforcement Learning and Representation Learning

Adaptive Reinforcement Learning, RL applied to Bayesian Networks

Swarm-based Reinforcement Learning and its applications in Robotics

Maximum diffusion Reinforcement Learning

Deep Reinforcement Learning for Physical Applications

Online tutorials and short readings

OpenAI resources:

DeepMind resources:

Computational Neuroscience Lab's resources:

Richard Sutton's online posts

Andrej Karpathy's blog

Medium

Python-based tools, techniques and design pattersn for Reinforcement Learning projects

PyLessons online tutorials using OpenAI Gym environment

mlq.ai resources:

StudyWolf's resources:

Jeff Bradberry's blog

online lecture videos

Game Theory Resources

Books

Articles

online lecture videos

Online tutorials and short readins

Medium

Files

ReinforcementLearningAndGameTheoryResources.md

Latest commit

History

ReinforcementLearningAndGameTheoryResources.md

File metadata and controls

Reinforcement Learning Resources

Books

Articles and tutorials

Partially Observable Markov Processes articles

Multi-Agent Reinforcement Learning (MARL) articles

Reinforcement Learning in Supply Chain Management

introductory material to relevant algorithms by Wouter van Heeswijk

Online Algorithms and solving them with Reinforcement Learning

Decision Transformers - Reinforcement Learning via Sequence Modeling

Reinforcement Learning in Large Language Models and related algorithms

Reinforcement Learning from Human Feedback (RLHF)

Human-like Reasoning via Reinforcement Learning and Representation Learning

Adaptive Reinforcement Learning, RL applied to Bayesian Networks

Swarm-based Reinforcement Learning and its applications in Robotics

Maximum diffusion Reinforcement Learning

Deep Reinforcement Learning for Physical Applications

Online tutorials and short readings

OpenAI resources:

DeepMind resources:

Computational Neuroscience Lab's resources:

Richard Sutton's online posts

Andrej Karpathy's blog

Medium

Python-based tools, techniques and design pattersn for Reinforcement Learning projects

PyLessons online tutorials using OpenAI Gym environment

mlq.ai resources:

StudyWolf's resources:

Jeff Bradberry's blog

online lecture videos

Game Theory Resources

Books

Articles

online lecture videos

Online tutorials and short readins

Medium