-
Reinforcement Learning - An Introduction, Second Edition, Richard Sutton, Andrew Barto, 2020
-
Reinforcement Learning and Optimal Control, Dimitri Bertsekas, 2019 (Draft)
-
Reinforcement Learning and Stochastic Optimization, Warren Powel, 2019
-
Reinforcement Learning - State of the Art, Marco Wiering, Martijn van Otterlo, 2012
-
An Introduction to Deep Reinforcement Learning, Vincent Francois-Lavet et al, McGill, 2018
-
On Actor-Critic Algorithms, Vijay Konda, John Tsitsiklis, MIT, SIAM, 2003
-
Learning to Predict by the Method of Temporal Differences, R. Sutton, 1988
-
Gradient Descent for General Reinforcement Learning, Leemon Baird and Andrew Moore, 1999
-
Playing Atari with Deep Reinforcement Learning, V. Mnih, 2013
-
Continuous Control With Deep Reinforcement Learning, T. Lillicrap, et al, 2015
-
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games, Heinrich, UCL, 2016
-
A Definition of Continual Reinforcement Learning, David Abel et al, DeepMind, 2023
-
A Model of Mood as Integrated Advantage, Bennett, D, Davidson, G., Niv Y, 2023
-
What Can Learned Intrinsic Rewards Capture?, Zheng at al, 2020
-
Fast Computation of Nash Equilibria in Imperfect Information games, Munos et al, 2020
-
Expression Non-Markov Reward to a Markov Agent, Abel et al, 2020
-
Making Sense of Reinforcement Learning and Probabilistic Inference, O'Donoghue et al, ICLR, 2020
-
A Bayesian Approach to Robust Reinforcement Learning, Derman, Derman et al, 2019
-
Optimizing Agent Behavior Over Long Time Scales By Transporting Value, Hung, Lillicrap, 2019
-
Statistics and Samples in Distributional Reinforcement Learning, Rowland et al, 2019
-
Policy Optimization with Linear Temporal Logic Constraints, C. Voloshin et al, Caltech, 2022
-
Generative Adversarial Self-Imitation Learning, Guo, Oh, et al, 2018
-
Deep Reinforcement Learning and the Deadly Triad, von Hasselt et al, 2018
-
Fast deep reinforcement learning using online adjustments from the past, Hansen et al, 2018
-
A Generalised Method for Empirical Game Theoretic Analysis, K. Tuyls, J. Perolat, et al, 2018
-
Deep Reinforcement Learning from Sels-Play in Imperfect-Information Games, Heinrich, Silver, 2016
-
Gradient Estimation Using Stochastic Computation Graphs, J. Schulman, N. Heess, 2016
-
Understanding Sampling-Based Adversarial Search Methods, Ramanujan, PhD Thesis, 2012
-
Reinforcement Learning and Simulation-Based Search in Computer Go, David Silver, PhD Thesis 2009
-
Dynamic Programming and Markov Processes, Ronald Howard, 1960
-
Deep Reinforcement Learning slides, David Silver, DeepMind, 2015
-
Deep Reinforcement Learning slides, Pieter Abbeel, UC Berkeley (NIPS 2016)
-
Deep Reinforcement Learning slides, Pieter Abbeel, UC Berkeley (August 23th, 2016)
-
Reinforcement Learning and Optimal Control A Selective Overview, Dimitri Bertsekas
-
Simple Embodied Language Learning as a Byproduct of Meta-reinforcement Learning, Liu 2023
...More Reinforcement Learning articles on DeepMind site
-
POMDP Solution Methods, D. Braziunas, University of Toronto, 2003
-
Optimal Control of Markov Processes with Incomplete State Information, Astrom, JMAA, 1965
-
Dynamic Programming Conditions for Partially Observable Stochastic Systems, M.H.A. Davis et al, 1971
-
Simple Embodied Language as Byproduct of Meta-Reinforcement Learning, Liu, Suri, Zhou, Finn, 2023
-
RODE: Learning Roles To Decompose Multi Agent Tasks, Wang, Gupta, Mahajan, Bei Peng, 2020
-
The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games, Yu et al, Tsinghua U., 2022
-
Offline Multi-Agent Reinforcement Learning with Knowledge Distillation, Tseng, MIT-CSAIL, 2022
Why Hasn’t Reinforcement Learning Conquered The World (Yet)?, Wouter van Heeswijk, Medium
The Four Policy Classes of Reinforcement Learning, Wouter van Heeswijk, Medium
Policy Gradients In Reinforcement Learning Explained, Wouter van Heeswijk, Medium
Proximal Policy Optimization (PPO) Explained, Wouter van Heeswijk, Medium
Dynamic Pricing with Contextual Bandits: Learning by Doing, Massimiliano Costacurta, Medium
related repo: github.com/massi82/contextual_bandits
related docs: contextual-bandits.readthedocs.io
related video: PyData Tel Aviv Meetup: Contextual Bandit for Pricing - Daniel Hen & Uri Goren
A Unified Framework for Stochastic Optimization, Warren B. Powell, Princeton, 2017
Challenges of Real World Reinforcement Learning, Gabriel Dulac-Arnold, 2019
Sequential Decision Analytics for the Truckload Industry, Warren B. Powell, Optimal Dynamics, 2022
Stochastic Optimization, James C. Spall, John Hopkins U., 2012
How to Improve your Supply Chain with Deep Reinforcement Learning with Christian Hubbs, Medium
Deep reinforcement learning for supply chain and price optimization, Ilya Katsov, 2020, blog
Optimization of Apparel Supply Chain Using Deep Reinforcement Learning, JW. Chong et al, IEEE, 2022
Reinforcement learning for supply chain optimization, L. Kemmer et al, 2018
Deep Reinforcement Learning hands-on for Optimized Ad Placement with NandaKishore Joshi
related repo: ad placement example
link to the book Reinforcement Learning in Action
The Randomized k-Server Conjecture Is False!, S. Bubeck et al, 2023
The Online K-Server Problem, Aris Floratos, Ravi Boppana, Courant Institute, NUY
Online Decision Transformer, Q. Zheng et al, 2022
Offline Reinforcement Learning as one Big Sequence Modeling Problem, M. Janner et al, 2021
Training Agents using Upside Down Reinforcement Learning, R. Srivastava et al, 2021
RvS: What is Essential for Offline RL via Supervised Learning, Scott Emmons et al, 2022
related repo: https://qtransformer.github.io/
Basics of Reinforcement Learning for LLMs with Cameron Wolfe, medium
related paper: An Elementary Proof that Q Learning Converges Almost Surely, Matthew T. Regehr, Alex Ayoub, U of Alberta, 2021
related paper: Deep Reinforcement Learning for Autonomous Driving: A Survey, BR Kiran et al, 2021
related paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, J. Devlin et al, Google, 2021
related paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Y. Bai et al, Anthropic, 2022
related paper: Playing Atari with Deep Reinforcement Learning, V. Mnih et al, DeepMind, 2013
related paper: Distilling the Knowldege in a Neural Network, G. Hinton et al, Google, 2015
related paper: Llama 2: Open Foundation and Fine-Tuned Chat Models, H. Touvron et al, MetaAI, 2023
Deep Reinforcement Learnng from Human Preferences, Paul Christiano et al, OpenAI, 2017
Training Language Models to Follow Instructions With Human Feedback, L. Ouyang et al, OpenAI, 2022
Fine Tuning Language Models from Human Preferences, Daniel M. Ziegler et al, OpenAI, 2020
Learning to Summarize from Human Feedback, Nisan Stiennon et al, OpenAI, 2022
Learning from human preferences, Dario Amodei, OpenAI blog, 2017
Reinforcement Learning fro Human Feedback, Wikipedia
AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning, B. Huang et al, CMU, 2022
AdaRL repo: https://github.com/Adaptive-RL/AdaRL-code
Deep Reinforcement Learning for Swarm Systems, Maximilian Hüttenrauch et al, U. of Lincoln, 2019
Maximum diffusion reinforcement learning, Thomas Beurreta et al, Northwestern U., 2023
Magnetic Control of Tokamak Plasmas through Deep Reinforcement Learning, Jonas Degrave, 2021
First Nuclear Plasma Control with Digital Twin, Sabine Hossenfelder, Feb 2024, youtube video
-
Deep Reinforcement Learning blog
-
DeepMind Reinforcement Learning articles:
https://www.deepmind.com/research?tag=Reinforcement+learning
-
Demystifying Deep Reinforcement Learning
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/
-
The Bitter Lesson, Richard Sutton, 2019
-
Deep Reinforcement learning: Pong From Pixels
-
The Fundamentals of Reinforcement Learning with Ruben Winastwan
Markov Decision Process, Policy, Value Function, Bellman Equation, Dynamic Programming Implementation
https://towardsdatascience.com/the-fundamentals-of-reinforcement-learning-177dd8626042
-
Reinforcement Learning: Markov Decision Process (Part 1) with Tan Alvin
https://pub.towardsai.net/reinforcement-learning-markov-decision-process-part-1-e376991cafbe
-
Reinforcement Learning: Dynamic Programming and Monte Carlo (Part 2) with Tan Alvin
-
Reinforcement Learning: SARSA and Q Learning (Part 3) with Tan Alvin
https://pub.towardsai.net/reinforcement-learning-sarsa-and-q-learning-part-3-871bedbeaec0
-
Multi-Armed Bandits with Steve Roberts (Part 1): Mathematical Framework and Terminology
https://towardsdatascience.com/multi-armed-bandits-part-1-b8d33ab80697
-
Multi-Armed Bandits with Steve Roberts (Part 2): The Bandit Framework
https://towardsdatascience.com/multi-armed-bandits-part-2-5834cb7aba4b
-
Multi-Armed Bandits with Steve Roberts (Part 3): Bandit Algorithms
https://towardsdatascience.com/bandit-algorithms-34fd7890cb18
-
Multi-Armed Bandits with Steve Roberts (Part 4): The Upper Confidence Bound Bandit Algorithm
https://towardsdatascience.com/the-upper-confidence-bound-ucb-bandit-algorithm-c05c2bf4c13f
-
Multi-Armed Bandits with Steve Roberts (Part 5): Thompson Sampling
https://towardsdatascience.com/thompson-sampling-fc28817eacb8
-
Multi-Armed Bandits with Steve Roberts (Part 6): A Comparison of Bandit Algorithms
https://towardsdatascience.com/a-comparison-of-bandit-algorithms-24b4adfcabb
-
An Introduction to Reinforcement Learning with Steve Roberts (Part 1): State Values and Policy Evaluation
https://towardsdatascience.com/state-values-and-policy-evaluation-ceefdd8c2369
-
An Introduction to Reinforcement Learning with Steve Roberts (Part 2): Markov Decision Processes and Bellman Equations
https://towardsdatascience.com/markov-decision-processes-and-bellman-equations-45234cce9d25
-
An Introduction to Reinforcement Learning with Steve Roberts (Part 3): Policy and Value Iteration
https://towardsdatascience.com/policy-and-value-iteration-78501afb41d2
-
Introduction to Reinforcement Learning with Markel Ausin (Part 1): Multi-armed bandit problem
-
Introduction to Reinforcement Learning with Markel Ausin (Part 2): Q-Learning
-
Introduction to Reinforcement Learning with Markel Ausin (Part 3): Q-Learning with Neural Networks, Algorithm DQN
-
Introduction to Reinforcement Learning with Markel Ausin (Part 4): Double DQN and Dueling DQN
-
Introduction to Reinforcement Learning with Markel Ausin (Part 5): Policy Gradient
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 1): The four main subelements of a reinforcement learning system
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 2): Multi-arm bandits
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 3): Finite Markov Decision Processes
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 4): Dynamic Programming
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 5): Monte Carlo Methods
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 6): Temporal Difference (TD) Learning
-
Introduction to Reinforcement Learning with Sagi Shaier (Part 7): N-step Bootstraping
-
Reinforcement Learning with Dan Lee (Part 1): A Brief Introduction
-
Reinforcement Learning with Dan Lee (Part 2): Introducing Markov Process
-
Reinforcement Learning with Dan Lee (Part 3): The Markov Decision Process
-
Reinforcement Learning with Dan Lee (Part 4): Optimal Policy Search with MDP
-
Reinforcement Learning with Dan Lee (Part 5): Monte-Carlo and Temporal-Difference Learning
-
Reinforcement Learning with Dan Lee (Part 6): TD(\lambda) and Q-learning
-
Reinforcement Learning with Dan Lee (Part 7): A Brief Introduction to Deep Q Networks
-
Simple Reinforcement Learning with Tensorflow (Part 0), Arthur Juliani: Q-Learning with Tables and Neural Networks
-
Simple Reinforcement Learning in Tensorflow (Part 1), Arthur Juliani: Two-armed bandit
https://awjuliani.medium.com/super-simple-reinforcement-learning-tutorial-part-1-fd544fab149
-
Simple Reinforcement Learning with Tensorflow (Part 1.5), Arthur Juliani: Contextual bandits
-
Simple Reinforcement Learning with Tensorflow (Part 2), Arthur Juliani: Policy-based agents
https://awjuliani.medium.com/super-simple-reinforcement-learning-tutorial-part-2-ded33892c724
-
Simple Reinforcement Learning with Tensorflow (Part 3), Arthur Juliani: Model-based RL
-
Simple Reinforcement Learning with Tensorflow (Part 4), Arthur Juliani: Deep Q Networks and beyond
-
Simple Reinforcement Learning with Tensorflow (Part 5), Arthur Juliani: Visualizing an Agent's thoughts and actions
-
Simple Reinforcement Learning with Tensorflow (Part 6), Arthur Juliani: Partial Observability and Deep Recurrent Q-Networks
-
Simple Reinforcement Learning with Tensorflow (Part 7), Arthur Juliani: Action-Selection Strategies for exploration
-
Simple Reinforcement Learning with Tensorflow (Part 8), Arthur Juliani: Asynchronous Actor-Critic Agents (A3C)
-
Q Learning Algorithm: How to Successfully Teach an Intelligent Agent to Play A Game with Daul Dobilas
-
Reinforcement Learning with Sthanikam Santhosh (Part 1): Deep Q Learning using Tensorflow2
https://medium.com/@sthanikamsanthosh1994/deep-q-learning-using-tensorflow2-a5eabc1a8d82
-
Reinforcement Learning with Sthanikam Santhosh (Part 2): Policy Gradient (Reinforce) using Tensorflow2
-
Reinforcement Learning with Sthanikam Santhosh (Part 3): Dueling DQN using Tensorflow2
-
Reinforcement Learning with Sthanikam Santhosh (Part 4): Dueling Double Deep Q Learning
-
Reinforcement Learning with Sthanikam Santhosh (Part 5): Soft Actor-Critic (SAC) Network
-
Reinforcement Learning with Sthanikam Santhosh (Part 6): Deep Deterministic Policy Gradient (DDPG) using Tensorflow2
-
Reinforcement Learning with Sthanikam Santhosh (Part 7): Twin Delayed Deep Deterministic Policy Gradient (TD3) in Tensorflow2
-
Reinforcement Learning with Sthanikam Santhosh (Part 8): Proximal Policy Optimization (PPO) for trading environment (Tensorflow)
-
Foundational RL with Rahul Bhadani: Value Iteration and Policy Iteration
https://medium.com/mlearning-ai/foundational-rl-value-iteration-and-policy-iteration-76251e47581b
-
Foundational RL with Rahul Bhadani: Dynamic Programming
https://towardsdatascience.com/foundational-rl-dynamic-programming-28f96f6fb40e
-
Foundational RL with Rahul Bhadani: Solving Markov Decision Processes
https://towardsdatascience.com/foundational-rl-solving-markov-decision-process-d90b7e134c0b
-
Cross-entropy method for Reinforcement Learning with Avishree Khare
https://towardsdatascience.com/cross-entropy-method-for-reinforcement-learning-2b6de2a4f3a0
-
Temporal Difference Learning in Reinforcement Learning
https://medium.com/nerd-for-tech/temporal-difference-learning-in-reinforcement-learning-cf13ed159fcb
-
Dynamic Pricing with Reinforcement Learning from Scratch: Q-Learning with Nicolo Albanese
-
Reinforcement Learning: Q-Learning
A step-by-step guide to implementing the Q-Learning algorithm using OpenAI Gym for Taxi-V3
https://towardsdev.com/reinforcement-learning-q-learning-38146880ca49
-
Q-Learning: Utilizing Reinforcement learning algorithm to trace optimal path
-
A/B Optimization with Policy Gradient Reinforcement Learning
A step by step visual explanation of the Policy Gradient method
-
Making Sense of the Bias / Variance trade-off in (Deep) Reinforcement Learning
What goes into a stable accurate reinforcement signal?
-
Monte Carlo Methods for Reinforcement Learning with Shivam Mohan
Introduction
https://medium.com/nerd-for-tech/monte-carlo-methods-for-reinforcement-learning-d30d874dd817
-
n-step Bootrapping in Reinforcement Learning with Shivam Mohan
Introduction
https://medium.com/@shivamohan07/n-step-bootstrapping-in-reinforcement-learning-fa87cbd0584a
-
A brief overview of Eligibility Traces in Reinforcement Learning with Shivam Mohan
-
Temporal Difference Learning in Reinforcement Learning with Shivam Mohan
Introduction
https://medium.com/nerd-for-tech/temporal-difference-learning-in-reinforcement-learning-cf13ed159fcb
-
Reinforcement Learning - Generalization In Continuous State Space
Function Approximation with Random Walk Example
-
Q vs V in Reinforcement Learning, the Easy Way
https://zsalloum.medium.com/q-vs-v-in-reinforcement-learning-the-easy-way-9350e1523031
-
6 Reinforcement Learning Algorithms Explained
https://towardsdatascience.com/6-reinforcement-learning-algorithms-explained-237a79dbd8e
-
Gambler's Problem - When inaction is in fact optimal
https://borundev.medium.com/gamblers-problem-when-inaction-is-infact-optimal-1d8348b69c4f
-
Win at Blackjack with Reinforcement Learning
https://medium.com/the-power-of-ai/blackjack-with-reinforcement-learning-95f588dd670c
-
Learn to Win Games with Monte Carlo Reinforcement Learning
https://medium.com/the-power-of-ai/monte-carlo-reinforcement-learning-for-simple-games-71dc8f4ffda4
-
Build your own unbeatable TicTacToe with Reinforcement Learning
-
The Actor-Critic Reinforcement Learning Algorithm with Dhanoop Karunakaran
related link: The idea behind Actor-Critic, Sergios Karagiannakoson, blog, 2018
related link: Going Deeper Into Reinforcement Learning: Fundamentals of Policy Gradients, Seita's place (blog), 2017
related link: Actor-Critic Algorithms, Sergey Levine, CS 294-112, 2017
-
Implement Policy Iteration In Python - A Minimal Working Example with Wouter van Heeswijk
-
Implement Value Iteration In Python - A Minimal Working Example with Wouter van Heeswijk
-
A Minimal Working Example for Discrete Policy Gradients in TensorFlow 2.0 with Wouter van Heeswijk
related paper: link
-
The Five Building Blocks of Markov Decision Processes with Wouter van Heeswijk
https://towardsdatascience.com/the-five-building-blocks-of-markov-decision-processes-997dc1ab48a7
-
Walking Off The Cliff with Off-Policy Reinforcement Learning with Wouter van Heeswijk
-
Trust Region Policy Optimization (TRPO) Explained with Wouter van Heeswijk
https://towardsdatascience.com/trust-region-policy-optimization-trpo-explained-4b56bd206fc2
-
Policy Gradients in Reinforcement Learning Explained with Wouter van Heeswijk
https://towardsdatascience.com/policy-gradients-in-reinforcement-learning-explained-ecec7df94245
-
A Deep Dive into Problem States with Wouter van Heeswijk
https://towardsdatascience.com/a-deep-dive-into-problem-states-498ad0746c98
-
Deep Deterministic Policy Gradients Explained with Wouter van Heeswijk
https://towardsdatascience.com/deep-deterministic-policy-gradients-explained-4643c1f71b2e
-
Solving The Taxi Environment With Q-Learning - A Tutorial with Wouter van Heeswijk
https://towardsdatascience.com/solving-the-taxi-environment-with-q-learning-a-tutorial-c76c22fc5d8f
-
When Stochastic Policies Are Better Than Deterministic Ones with Wouter van Heeswijk
-
Common Reinforcement Learning Algorithms (And How To Fix Them) with Wouter van Heeswijk
-
Seven Exploration Strategies In Reinforcement Learning You Should Know with Wouter van Heeswijk
-
Why Reinforcement Learning Does Not Need Bellman's Equation with Wouter van Heeswijk
https://towardsdatascience.com/why-reinforcement-learning-doesnt-need-bellman-s-equation-c9c2e51a0b7
-
The Alberta Plan: Sutton's Research Vision for Artifical Intelligence with Wouter van Heeswijk
-
The Four Policy Classes of Reinforcement Learning with Wouter van Heeswijk
https://towardsdatascience.com/the-four-policy-classes-of-reinforcement-learning-38185daa6c8a
related papers:
Unified Framework for Stochastic Optimization, Warren Powel, Princeton, 2017
Tutorial on Stochastic Optimization in Energy II: An energy storage illustration, Warren Powel, 2015
-
Natural Policy Gradients In Reinforcement Learning Explained with Wouter van Heeswijk
related literature: Why Natural Gradient? S. Amari, S.C. Douglas, 1998
New Insights and Perspectives on the Natural Gradient Method, James Martens, DeepMind, 2020
-
Understand Policy Gradient by Building Cross Entropy from Scratch
-
A Deep Dive into Reinforcement Learning: Q-Learning and Deep Q-Learning on a 10x10 FrozenLake Environment with Nandan Grover
example code: https://github.com/nandangrover/reinforcement_frozenlake
-
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 1)
Apply offline-lambda on Random Walk
code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
https://towardsdatascience.com/reinforcement-learning-td-%CE%BB-introduction-686a5e4f4e60
-
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 2)
TD(lambda) with eligibility trace
code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
https://meatba11.medium.com/reinforcement-learning-td-%CE%BB-introduction-2-f0ea427cd395
-
Reinforcement Learning - TD(lambda) Introduction with Jeremy Zhang (Part 3)
Extend TD(lambda) on Q function with Sarsa(lambda)
code for RL, an Introduction : https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
https://towardsdatascience.com/reinforcement-learning-td-%CE%BB-introduction-3-f329bdbf872a
-
Reinforcement Learning - Generalization of Continuing Tasks
Server Access Example implementation
-
The exploration-exploitation tradeoff: intuitions and strategies with Joseph Rocca
Understanding e-greedy, optimistic initialization, UCB and Thompson sampling strategies
https://towardsdatascience.com/the-exploration-exploitation-dilemma-f5622fbe1e82
-
RL - Exploration with Deep Learning with Jonathan Hui
-
RL — Tips on Reinforcement Learning with Jonathan Hui
https://jonathan-hui.medium.com/rl-tips-on-reinforcement-learning-fbd121111775
-
Deep Reinforcement Learning - Deep Deterministic Policy Gradient (DDPG) algorithm with Markus Buchholz
-
Breaking down DeepMind's AlphaTensor
https://pub.towardsai.net/breaking-down-deepminds-alphatensor-15534303cde2
-
Batched Bandit Problems with Sean Smith
Multi-armed Bandits with delayed rewards in successive trials
https://towardsdatascience.com/batched-bandit-problems-ea73dba5da7a
-
A Lesson on Applied Reinforcement Learning in Production with Bill Zhu
-
Deep Reinforcement Learning for Network Design in Marine Transportation with Timothe Boulet
-
The Emotional Lives of RL Agents
https://awjuliani.medium.com/the-emotional-lives-of-rl-agents-12e2c8ee36af
-
Curious Agents: An Introduction with Dries Smith
https://medium.com/@dries.epos/curious-agents-ebfee02ef024
(code of this series: https://github.com/DriesSmit/CuriousAgents)
-
Curious Agents II: Solving MountainCar without Rewards
https://medium.com/@dries.epos/curious-agents-ii-solving-mountaincar-without-rewards-c49ae2177819
(code of this series: https://github.com/DriesSmit/CuriousAgents)
-
Curious Agents III: BYOL-Explore
https://medium.com/@dries.epos/curious-agents-iii-byol-explore-93f34fa6146a
(code of this series: https://github.com/DriesSmit/CuriousAgents)
-
Curious Agents IV: BYOL-Hindsight
https://medium.com/@dries.epos/curious-agents-iv-byol-hindsight-318c559175f0
(code of this series: https://github.com/DriesSmit/CuriousAgents)
-
Understanding the World Through Action: RL as a Foundation for Scalable Self-Supervised Learning with Sergey Levine
-
How Robots Can Learn End-to-End from Data with Sergey Levine
https://medium.com/@sergey.levine/how-robots-can-learn-end-to-end-from-data-3d879b0a2ba1
-
Decisions from Data: How Offline Reinforcement Learning Will Change How We Use Machine Learning with Sergey Levine
-
An Ecological Perspective on Reinforcement Learning
https://medium.com/@sergey.levine/an-ecological-perspective-on-reinforcement-learning-de697f3d6516
-
Function Approximation in Reinforcement Learning
https://towardsdatascience.com/function-approximation-in-reinforcement-learning-85a4864d566
-
A Gentle Introduction to Deep Reinforcement Learning with Jordi Torres
(Github repo: here)
-
Formalization of a Reinforcement Learning Problem with Jordi Torres
https://towardsdatascience.com/drl-02-formalization-of-a-reinforcement-learning-problem-108b52ebfd9a
(Github repo: here)
-
Deep Learning Basics: Basic Concepts for Beginners with Jordi Torres
https://towardsdatascience.com/deep-learning-basics-1d26923cc24a
(Github repo: here)
-
Deep Learning with PyTorch: First contact with PyTorch for beginners with Jordi Torres
https://towardsdatascience.com/deep-learning-with-pytorch-a93b09bdae96
(Github repo: here)
-
PyTorch Performance Analysis with TensorBoad: How to run TensorBoard for PyTorch inside Colab
https://towardsdatascience.com/pytorch-performance-analysis-with-tensorboard-7c61f91071aa
(Github repo: here)
-
Solving a Reinforcement Learning Problem Using Cross-Entropy Method: Agent Creation Using Deep Neural Networks with Jordi Torres
(Github repo: here)
-
Cross-Entropy Method Performance Analysis: Implementation of Cross Entropy Training Loop with Jordi Torres
https://towardsdatascience.com/cross-entropy-method-performance-analysis-161a5faef5fc
(Github repo: here)
-
The Bellman Equation: V-function and Q-function explained with Jordi Torres
https://towardsdatascience.com/the-bellman-equation-59258a0d3fa7
(Github repo: here)
-
The Value Iteration Algorithm: Estimation of Transitions and Rewards from the Agent's Experience with Jordi Torres
https://torres.ai/deep-reinforcement-learning-explained-series/
(Github repo: here)
-
Value Iteration for V-function: V-function in Practice for Frozen-Lake Environment with Jordi Torres
https://towardsdatascience.com/value-iteration-for-v-function-d7bcccc1ec24
(Github repo: here)
-
Value Iteration for Q-function: Frozen Lake code for Q function with Jordi Torres
https://towardsdatascience.com/value-iteration-for-q-function-ac9e508d85bd
(Github repo: here)
-
Reviewing Essential Concepts: Mathematical Notation Updated with Jordi Torres
https://towardsdatascience.com/reviewing-essential-concepts-from-part-1-e28234ee7f4f
(Github repo: here)
-
Monte Carlo methods: Exploration-Exploitation Dilemma with Jordi Torres
https://towardsdatascience.com/monte-carlo-methods-9b289f030c2e
(Github repo: here)
-
MC Control and Temporal Difference Methods: Constant-a MC Control, Sarsa, Q-Learning with Jordi Torres
https://towardsdatascience.com/mc-control-methods-50c018271553
(Github repo: here)
-
Deep Q-Network (DQN)-I: OpenAI Gym Pong and Wrappers with Jordi Torres
https://towardsdatascience.com/deep-q-network-dqn-i-bce08bdf2af
(Github repo: here)
-
Deep Q-Network (DQN)-II: Experience Replay and Target Networks with Jordi Torres
https://towardsdatascience.com/deep-q-network-dqn-ii-b6bf911b6b2c
(Github repo: here)
-
Deep Q-Netowrk (DQN)-III: Performance and Use with Jordi Torres
https://towardsdatascience.com/deep-q-network-dqn-iii-c5a83b0338d2
(Github repo: here)
-
Policy-Based Methods: Hill Climbing Algorithm with Jordi Torres
https://towardsdatascience.com/policy-based-methods-8ae60927a78d
(Github repo: here)
-
Policy-Gradient Methods: REINFORCE algorithm with Jordi Torres
https://towardsdatascience.com/policy-gradient-methods-104c783251e0
(Github repo: here)
-
Model Compression in Reinforcement Learning with Kartik G (Part 1)
https://medium.com/@kartikganapathi/model-compression-in-reinforcement-learning-part-1-91970a84a24a
-
Model Compression in Reinforcement Learning with Kartik G (Part 2)
https://medium.com/@kartikganapathi/model-compression-in-reinforcement-learning-part-2-8e57269c8386
-
Reinforcement Learning Frameworks: Solving CartPole Environment using RLib on Ray framework with Jordi Torres
https://towardsdatascience.com/reinforcement-learning-frameworks-e349de4f645a
(Github repo: here)
-
Deep Laplacian-based Options for Temporally-Extended Exploration with Marlos Machado
-
RL - Explorationg with Deep Learning with Jonathan Hui
-
Multi-Agent Reinforcement Learning (MARL) algorithms with Mehul Gupta
-
Multi-agent Reinforcement Learning Paper Reading - RODE: Learning Roles to Decompose Multi-agent Tasks with Christian Lin
link to the paper here
-
Multi-agent Reinforcement Learning Paper Reading - The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games with Christian Lin
-
Multi-agent Reinforcement Learning paper Reading - Offline Reinforcement Learning with Knowledge Distillation with Christopher Lin
link to the paper here
-
Reinforcement Learning/RL with diffusion - I with Ayush Mangal
https://ayushtues.medium.com/rl-with-diffusion-i-d64c6e96d5ed
link to the paper here
link to web page here
link to github repo here
colab link here
-
Can Reinforcement Learning Generalize Beyond Its Training with John Morrow
link to the paper here
-
Dynamic Model Selection using Reinforcement Learning with Monimoy Purkayastha
related papers:
Adaptive Model Selection Network: Application to Airline Pricing, Shukla et al, 2019
Introduction to Multi-Armed Bandits, A. Slivkins, Microsoft Reserach, 2022
Model Selection for Contextual Bandits, D.J. Foster et al, MIT, Microsoft Research, 2019
Online and Scalable Model Selection with Contextual Bandits, Xie et al, 2021
-
Reinforcement Learning in the Warehousing Industry with Chris Mahoney
https://ai.plainenglish.io/reinforcement-learning-in-the-warehousing-industry-a5e7f1c28422
related papers:
Reinforcement Learning Approach to Porduct Allocation and Storage, M. Andra M.S. Thesis
-
How to apply reinforcement learning to order-pick routing in warehouses (including Python code) with SMLC
-
Evolving Reinforcement Learning Agents Using Genetic Algorithms with Mohamed Abdin
related github repo: https://github.com/mohdabdin/Evolving-RL-Agents
-
Understanding Zero-Shot Learning — Making ML More Human with Ekin Tiu
https://towardsdatascience.com/understanding-zero-shot-learning-making-ml-more-human-4653ac35ccab
related paper: Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles, Kathy Jang et al, UC Berkeley, 2019
related paper: Learning Transferable Visual Models From Natural Language Supervision, Alec Radford et al, OpenAI, 2021
related paper: Zero-Shot Learning and its Applications from Autonomous Vehicles to COVID-19 Diagnosis: A Review, M. Rezaei et al, U Leeds, 2020
-
Policy Based Reinforcement Learning — A Detailed Study, Part 1, with NandaKishore Joshi
link to book: here
-
Policy Based Reinforcement Learning — OpenAI’s Cartpole with REINFORCE algorithm, Part 2, with NandaKishore Joshi
-
RLAIF: Reinforcement Learning from AI Feedback with Cameron R. Wolfe, Jan, 2024
https://towardsdatascience.com/rlaif-reinforcement-learning-from-ai-feedback-d7dbdae8f093
related paper: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback, Harrison Lee et al, 2023
related paper: Constitutional AI: Harmlessness from AI Feedback, Y. Bai, 2022
related paper: PaLM: Scaling Language Modeling with Pathways, A. Chowdhery et al, 2022
related paper: PaLM 2 Technical Report, Google, 2023
related paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al, Google Research, 2022
related paper: Self-Consistency Improves Chain of Thought Reasoning in Language Models, Wang et al, Google Research, ICLR 2023
related paper: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback, Bai et al, Anthropic, 2022
related paper: A General Language Assistant as a Laboratory for Alignment, A. Askell et al, Anthropic, 2021
related paper: Learning to summarize from human feedback, N. Stiennon et al, OpenAI, 2022
-
Learning with Stable-Baselines3: Reinforcement learning without the boilerplate code
https://towardsdatascience.com/convenient-reinforcement-learning-with-stable-baselines3-dccf466b7585
-
Quickly Generate Combinatorial State Spaces in Python with Wouter van Heeswijk
https://medium.com/codex/quickly-generate-combinatorial-state-spaces-in-python-c53decab2bdd
-
Quickly Generate Combinatorial Action Spaces in Python with Wouter van Heeswijk
https://medium.com/codex/quickly-generate-combinatorial-action-spaces-15962118e508
-
Speed Up Your Simulations By Deploying These Sampling Strategies with Wouter van Heeswijk
-
Reinforcement Learning for Combinatorial Optimization with Or Rivlin
https://towardsdatascience.com/reinforcement-learning-for-combinatorial-optimization-d1402e396e91
related repo: Minimum vertex cover with Deep Reinforcement Learning
related paper: Learning Heuristics over Large Graphs via Deep Reinforcement Learning, S. Manchanda et al, IIT-Deli, 2020
-
Inverse Reinforcement Learning with Jonathan Hui
https://jonathan-hui.medium.com/rl-inverse-reinforcement-learning-56c739acfb5a
-
Reinforcement Learning Part 1: Q-Learning and exploration
https://studywolf.wordpress.com/2012/11/25/reinforcement-learning-q-learning-and-exploration/
-
Reinforcement Learning Part 2: Sarsa versus Q learning
https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/
example code: https://github.com/studywolf/blog/tree/master/RL/SARSA%20vs%20Qlearn%20cliff
ccm suite code: https://github.com/tcstewar/ccmsuite
-
Reinforcement Learning Part 3: Egocentric learning
https://studywolf.wordpress.com/2015/03/29/reinforcement-learning-part-3-egocentric-learning/
example code: https://github.com/studywolf/blog/tree/master/RL/Egocentric
-
Reinforcement Learning Part 4: Combining Egocentric and Allocentric
example code: https://github.com/studywolf/blog/tree/master/RL/combination%20allo%20and%20ego
-
Deep Learning for control using augmented hessian-free optimization
example code: https://github.com/studywolf/blog/blob/master/train_AHF/train_hf.py
-
Control as Inference and Soft Deep RL with Sergey Levine (NIPS 2018)
-
Tutorial: Introduction to Reinforcement Learning with Function Approximation (NIPS Tutorials 2015)
-
Deep Reinforcement Learning: John Schulman, OpenAI, Berkeley (MLSS Cadiz, 2016)
-
Deep Reinforcement Learning, David Silver, Dept. of Computer Science, University College, (July 2015, London) Note: lecture slides included under section "Articles and tutorials"
-
Deep Reinforcement Learning, Pieter Abbeel, Dept. of Electrical Engineering and Computer Sciences, UC Berkeley, August 2016 Note: lecture slides included under section "Articles and tutorials"
-
Reinforcement Learning, Emma Brunskill, Stanford CS234, Winter 2019
-
RLHF: Reinforcement Learning from Human Feedback with Ms Aerin
related paper: Training language models to follow instructions with human feedback, Ouyang et al, OpenAI, 2022
related python code: https://github.com/lucidrains/PaLM-rlhf-pytorch/tree/main
-
Theory of Games and Statistical Decisions, David Blackwell, 1954
-
N-Person Game Theory: Concepts and Applications by A. Rapoport, 1970
-
Mathematical Foundations of Game Theory, Rida Laraki, Jerome Renault, Sylvain Sorin, 2010
-
Game Theory: Decisions, Interactions and Evolution, James N. Webb, 2000
-
Notes on the N-Person Game - II: The Value of N-Person Game, L.S. Shapley, 1951
-
A Theory of Individual Choice Behavior, R. Duncan Luce, Columbia U., 1957
-
The Expected Outcome Model of Two-Player Games, Bruce Abramson, Columbia U, 1987
-
Paradoxical Behaviour of Mechanical and Electrical Networks, J. Cohen, P. Horowitz, Harvard U, 1991
-
Statistical Mechanics of Systems with heterogenous agents: Minority Games, D. Challet et al, 1999
-
Cooperative Games: Core and Shapley Values, R. Serano, Brown U., 2007
-
Algorithmic Game Theory, Tim Roughgarden, Stanford CS364A, Fall 2013
-
Game Theory Through the Computational Lens, Tim Roughgarden, LSE Events
-
NashPy: Strategic Interactions in Python
https://medium.com/@agbonorino/nashpy-strategic-interactions-in-python-aac937c916a5