Skip to content

Reinforcement Learning Formation repository, as part of Automatants formations for the CentraleSupélec campus.

Notifications You must be signed in to change notification settings

tboulet/Formation-Reinforcement-Learning

Repository files navigation

Formation-Reinforcement-Learning

This is the repository for the Reinforcement Learning course at Automatants, the AI student association of CentraleSupélec. The course was given to students of the CentraleSupélec campus as an introduction to Reinforcement Learning.

Concepts covered in the first part (slides 1-39):

  • RL Framework (Environment (with examples), MDP, Policy, Cumulative reward, State and Action Value)
  • Environnement shaping (Reward shaping, State shaping, Action shaping)
  • Prediction and Control problems
  • Model-based methods : Dynamic Programming (Bellman Equations, Policy Iteration, Value Iteration)

Concepts covered in the second part (slides 40-80):

  • Model-free methods : Monte Carlo, TD Learning (SARSA, Q-Learning, Expected SARSA), n-step TD Learning
  • Exploration-Exploitation Dilemma
  • Exploration Replay
  • Deep RL introduction
  • Deep Q Network (DQN)
  • Parallelization in RL
  • Librairies and ressources in RL

Policy-based RL methods and Importance Sampling are also covered in the slides (81 - 88), but not in the lectures.

Videos

Videos of the lectures are available (in French only) on the Automatants Youtube channel.

Part 1: Introduction to Reinforcement Learning and Model-based methods (RL Framework, Bellman Equations, Dynamic Programming)

Part 2: Model-free methods and deeper concepts in RL : Monte Carlo, TD Learning (SARSA, Q-Learning, Expected SARSA), Exploration-Exploitation Dilemma, Off-Policy Learning, Deep RL intro

Slides

Slides of the lectures are available in this repository in French and English in the as powerpoint files "slides ENGLISH.pptx" and "slides FR.pptx".

Gridworld environment

Q values through training

The Gridworld environment is available here. It was a simple gridworld environment developped to implement the algorithms seen in the lectures. The goal was to visualize Q values or probabilities of actions during the training of the agent. Several environments/grids (with different rewards, obstacles, etc.) and several agents (including your own) are available. More information on the GitHub repository.

Streamlit app

You can visualize the results of the algorithms seen in the lectures and the influence of many hyperparameters with the Streamlit app.

This include 3 environnements : OceanEnv (reach the goal as fast as possible), Nim (take the last stick) and a Contextual Bandit environment (choose the best arm at each state).

The app is deployed with Streamlit and should be available here.

If that is not the case, you can still install streamlit with pip and then run the app locally with the following command:

streamlit run streamlit_app.py

About

Reinforcement Learning Formation repository, as part of Automatants formations for the CentraleSupélec campus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages