A Crash Course on Reinforcement Learning for Control Problems Using TensorFlow 2

This is a self-contained repository to explain two basic Reinforcement (RL) algorithms, namely Policy Gradient (PG) and Q-learning, and show how to apply them on control problems. Dynamical systems might have discrete action-space like cartpole where two possible actions are +1 and -1 or continuous action space like linear Gaussian systems. Usually, you can find a code for only one of these cases. It might be not obvious how to extend one to another.

In this repository, we will explain how to formulate PG and Q-learning for each of these cases. We will provide implementations for these algorithms for both cases as Jupyter notebooks. You can also find the pure code for these algorithms (and also a few more algorithms that I have implemented but not discussed). The code is easy to follow and read. We have written in a modular way, so for example, if one is interested in the implementation of an algorithm is not confused with defining an environment in gym or plotting the results or so on. The theoretical materials in this repo is summarized in a handout which is available in ArXiv. Click here to access the handoutThe handout can be downloaded from here

Citing this repo

Here is a BibTeX entry that you can use to cite the handout in a publication:

@misc{yaghmaie2021crash,
      title={A Crash Course on Reinforcement Learning}, 
      author={Farnaz Adib Yaghmaie and Lennart Ljung},
      year={2021},
      eprint={2103.04910},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

If you use this repo, please consider citing the following relevant papers:

F. Adib Yaghmaie, S. Gunnarsson and F. L. Lewis "Output Regulation of Unknown Linear Systems using Average Cost Reinforcement Learning", Automatica, Vol. 110, 2019.
F. Adib Yaghmaie and F. Gustafsson "Using Reinforcement Learning for Model-free Linear Quadratic Control with Process and Measurement Noises", In 2019 Decision and Control (CDC)4, IEEE 58th Conference on, 2019, pp. 6510-6517.
F. Adib Yaghmaie and s. Gunnarsson "A New Result on Robust Adaptive Dynamic Programming for Uncertain Partially Linear Systems", In 2019 Decision and Control (CDC)4, IEEE 58th Conference on, 2019, pp. 7480-7485.

How to use this repo

This repository contains presentation files and codes.

The presentation files are related to the LINK-SIC workshop on Reinforcment Learning. The first day will be Friday March 12, 2021, 13.15 - 16.30, and the second day will be Tuesday April 6, 2021, 13.15 - 16.30. You can find the presentation files in pdf in the folder presentation.

The code is given as Jupyter notebooks and python files. If you want to run Jupyter notebooks, I suggest to use google colab. If you want to extend the results and examine more systems, I suggest to clone this repostory and run on your computer.

Running on google colab

Go to [https://colab.research.google.com/notebooks/intro.ipynb] and sign in with a Google acount.
Click File, and Upload notebook. If you get the webpage in Swedish, click Arkiv and then Ladda upp anteckningsbok.
Select github and paste the following link [https://github.com/FarnazAdib/Crash_course_on_RL.git].
Then, a list of files with type .ipynb appears. They are Jupyter notebooks. Jupyter notebooks can have both text and code and it is possible to run the code. As an example, scroll down and open pg_on_cartpole_notebook.ipynb.
The file contains some cells with text and come cells with code. The cells which contain code have $[]$ on the left. If you move your mouse over $[ ]$, a play box appears. You can click on it to run the cell. Make sure not to miss a cell as it causes fatal errors.
You can continue like this and run all code cells one by one up to the end.

Running on local computer

Go to [https://github.com/FarnazAdib/Crash_course_on_RL.git] and clone the project.
Open PyCharm. From PyCharm. Click File and open project. Then, navigate to the project folder.
Follow Preparation notebook to build a virtual environment and import required libraries.

Where to start

The theoretical materials in this repo is nicely summarized in our handout in pdf format available at https://arxiv.org/abs/2103.04910. If you wish to read the materials in this repo, you can start by reading about Reinforcement Learning

An introduction to Reinforcement Learning

Dynamical systems

You can read about dynamics systems (or environments in RL terminology) that we consider in this repo here.

Policy Gradient

Policy Gradient is one of the popular RL routines that relies upon optimizing the policy directly. Below, you can see jupyter notebooks regarding Policy Gradient (PG) algorithm

Explanation of Policy Gradient (PG)
- How to code PG for problems with discrete action space (cartpole)
- How to code PG for problems with continuous action space (linear quadratic)

You can also see the pure code for PG

PG pure code
- PG for discrete action space (cartpole)
- PG for continuous action space (linear quadratic)

Q-learning

Q-learning is another popular RL routine that relies upon dynamic programming. Below, you can see jupyter notebooks regarding Q-learning algorithm

Explanation of Q-learning
- How to code Q-learning for problems with discrete action space (cartpole)
- How to code Q-learing for problems with continuous action space (linear quadratic)
Explanation of experience replay Q-learning
- How to code experience replay Q learning for systems with discrete action space (cartple)
- We have not implemented explerience replay Q learning on LQ problem because the plain Q-learning is super good on LQ porblem. Note that as you can see from the explanation in the experience replay Q-learning, this algorithm has only two simple functions in addition to the plain Q-learning and those are not related to the action to be discrete or continuous. So, the extension is quite straight forward.

You can also see the pure code for Q- and experience replay Q-learning

Q-learning pure code
- Q-learning for discrete action space (cartpole)
- Q-learning for continuous action space (linear quadratic)
Experience replay Q-learning pure code
- Experience replay Q-learning for discrete action space (cartple)

Presentation files

The presentation files for the LINK-SIC workshop can be downloaded from the folder called presentation. There, you can find the presentation files for day1 and day2.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
cartpole		cartpole
lq		lq
pic		pic
presentation		presentation
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Preparation.ipynb		Preparation.ipynb
README.md		README.md
RL_Intro.ipynb		RL_Intro.ipynb
_config.yml		_config.yml
cartpole.ipynb		cartpole.ipynb
linear_quadratic.ipynb		linear_quadratic.ipynb
pg_notebook.ipynb		pg_notebook.ipynb
pg_on_cartpole_notebook.ipynb		pg_on_cartpole_notebook.ipynb
pg_on_lq_notebook.ipynb		pg_on_lq_notebook.ipynb
q_notebook.ipynb		q_notebook.ipynb
q_on_cartpole_notebook.ipynb		q_on_cartpole_notebook.ipynb
q_on_lq_notebook.ipynb		q_on_lq_notebook.ipynb
replay_q_notebook.ipynb		replay_q_notebook.ipynb
replay_q_on_cartpole_notebook.ipynb		replay_q_on_cartpole_notebook.ipynb
setup.py		setup.py

License

FarnazAdib/Crash_course_on_RL

Folders and files

Latest commit

History

Repository files navigation

A Crash Course on Reinforcement Learning for Control Problems Using TensorFlow 2

Citing this repo

How to use this repo

Running on google colab

Running on local computer

Where to start

Dynamical systems

Policy Gradient

Q-learning

Presentation files

About

Resources

License

Stars

Watchers

Forks

Languages