Formation-Reinforcement-Learning

This is the repository for the Reinforcement Learning course at Automatants, the AI student association of CentraleSupélec. The course was given to students of the CentraleSupélec campus as an introduction to Reinforcement Learning.

Concepts covered in the first part (slides 1-39):

RL Framework (Environment (with examples), MDP, Policy, Cumulative reward, State and Action Value)
Environnement shaping (Reward shaping, State shaping, Action shaping)
Prediction and Control problems
Model-based methods : Dynamic Programming (Bellman Equations, Policy Iteration, Value Iteration)

Concepts covered in the second part (slides 40-80):

Model-free methods : Monte Carlo, TD Learning (SARSA, Q-Learning, Expected SARSA), n-step TD Learning
Exploration-Exploitation Dilemma
Exploration Replay
Deep RL introduction
Deep Q Network (DQN)
Parallelization in RL
Librairies and ressources in RL

Policy-based RL methods and Importance Sampling are also covered in the slides (81 - 88), but not in the lectures.

Videos

Videos of the lectures are available (in French only) on the Automatants Youtube channel.

Part 1: Introduction to Reinforcement Learning and Model-based methods (RL Framework, Bellman Equations, Dynamic Programming)

Lecture 1: Introduction to Reinforcement Learning

Part 2: Model-free methods and deeper concepts in RL : Monte Carlo, TD Learning (SARSA, Q-Learning, Expected SARSA), Exploration-Exploitation Dilemma, Off-Policy Learning, Deep RL intro

Lecture 2: Deeper concepts in Reinforcement Learning

Slides

Slides of the lectures are available in this repository in French and English in the as powerpoint files "slides ENGLISH.pptx" and "slides FR.pptx".

Gridworld environment

The Gridworld environment is available here. It was a simple gridworld environment developped to implement the algorithms seen in the lectures. The goal was to visualize Q values or probabilities of actions during the training of the agent. Several environments/grids (with different rewards, obstacles, etc.) and several agents (including your own) are available. More information on the GitHub repository.

Streamlit app

You can visualize the results of the algorithms seen in the lectures and the influence of many hyperparameters with the Streamlit app.

This include 3 environnements : OceanEnv (reach the goal as fast as possible), Nim (take the last stick) and a Contextual Bandit environment (choose the best arm at each state).

The app is deployed with Streamlit and should be available here.

If that is not the case, you can still install streamlit with pip and then run the app locally with the following command:

streamlit run streamlit_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
DP		DP
MC		MC
TD		TD
environnements		environnements
figure		figure
gridworld_rl @ 916d217		gridworld_rl @ 916d217
playground_app		playground_app
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
RL course EN v2022.pdf		RL course EN v2022.pdf
RL course FR v2022.pdf		RL course FR v2022.pdf
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

tboulet/Formation-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Formation-Reinforcement-Learning

Videos

Slides

Gridworld environment

Streamlit app

About

Resources

Stars

Watchers

Forks

Languages