Artificial Neural Networks : Epidemic Control Project

Building a reinforcement learning environment for training epidemic mitigation policies.

Dynamical model design:

We model the epidemic dynamics on a graph representing cities (see Switzerland example above) with the following per-node dynamics (for city $i \in [m]$ for $m$ cities):

$$ \begin{cases} \dot{s}i = \gamma_i r_i - \alpha_i s_i(i_i + \sum{j\neq i} \tau_j i_j ) - \mu \ \dot{e}i = \alpha_i s_i(i_i + \sum{j\neq i} \tau_j i_j ) -\eta_i e_i\ \dot{i} = \eta_i e_i - (\beta_i + \zeta_i \cdot i_i) i_i\ \dot{r} = \beta_i i_i - \gamma_i r_i + \mu \ \dot{d} = \zeta_i i_i^2 \ \end{cases} $$

where the $_i$ subscript denotes the $i$-th city, the variables are the following:

$s_i$ the proportion of susceptible population
$e_i$ the proportion of exposed (but not yet infected) population
$i_i$ the proportion of infected population
$r_i$ the proportion of recovered population
$d_i$ the proportion of dead population

the parameters are the following:

$\alpha_i$ is the transmission rate
$\eta_i$ is the infection rate
$\beta_i$ is the recovery rate
$\zeta_i$ is the death rate
$\mu$ is the vaccination rate
$\gamma_i$ is the immunity-loss rate

all of these parameters, are sampled from normal distributions at each time step (the dynamics are stochastic) : $\alpha_i = \mathcal{N}(\bar{\alpha},\sigma_\alpha)$. This leads to epidemic processes which look like that:

Since $s_i + e_i +i_i + r_i +d_i = 1$ the model is a $4\cdot m$-th order model. On the Switzerland graph model that we use this correspond to a $9\cdot 4=36$'th order rate.

Markov decision process

We choose parameters such that 3 dynamical model integration steps amount to one day. Thus 21 steps amount to one week. We work with a partially observable MDP (POMDP) where the actor can only measure:

the number of infected people per city (once a day)
the number of dead people per city (once a day)
the initial number of people in the country

Alternative action space any combination of :

confinement (country-wide)
Isolation (country-wide)
pay for (country-wide)
subsidize vaccination (country-wide)

The reward is computed as follows :

$$ \begin{aligned} R(s^{(t)},a^{(t)}) = R_\text{const} - A_\text{cost} - V_\text{cost} - H_\text{cost} \ - \frac{1}{\text{pop}\text{total}} \sum{\text{city} \in \text{map}} ( D_\text{cost} \cdot \Delta d_\text{city}^{(t)} + C_\text{cost} \cdot c_\text{city}^{(t)} + I_\text{cost} \cdot i_\text{city}^{(t)} ), \end{aligned} $$

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
__pycache__		__pycache__
config		config
deep_q_learning		deep_q_learning
docs		docs
epidemic_env		epidemic_env
figures		figures
dynamics_epidemics.ipynb		dynamics_epidemics.ipynb
eval_experiments.ipynb		eval_experiments.ipynb
main.py		main.py
model_experiments.ipynb		model_experiments.ipynb
readme.md		readme.md
train_all.py		train_all.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

config

config

deep_q_learning

deep_q_learning

docs

docs

epidemic_env

epidemic_env

figures

figures

dynamics_epidemics.ipynb

dynamics_epidemics.ipynb

eval_experiments.ipynb

eval_experiments.ipynb

main.py

main.py

model_experiments.ipynb

model_experiments.ipynb

readme.md

readme.md

train_all.py

train_all.py

Repository files navigation

Artificial Neural Networks : Epidemic Control Project

Dynamical model design:

Markov decision process

About

Releases

Packages

Languages

RenardDesNeiges/EpidemicsSimulator

Folders and files

Latest commit

History

Repository files navigation

Artificial Neural Networks : Epidemic Control Project

Dynamical model design:

Markov decision process

About

Topics

Resources

Stars

Watchers

Forks

Languages