Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation docs #32

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,6 @@ venv.bak/

# Git credentials
.git-credentials

# Visual Studio Code
.vscode
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2019 Rüdiger Busche
Copyright (c) 2020 Rüdiger Busche

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
2 changes: 2 additions & 0 deletions docs/2_outbreak_detection.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _outbreak-detection-formalization:

Outbreak Detection
==================
Surveillance algorithms usually work on regular spaced aggregated time series of case counts.
Expand Down
15 changes: 15 additions & 0 deletions docs/3_data_simulation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Data Simulations
==================
Usually, public-health-related data can not be disclosed publicly which makes the development and systematic comparison of outbreak detection algorithms hard. Luckily, using simulations of hypothetical disease spreads is a valid alternative to real data. Thus, epysurv includes a module that allows you to simulate simple, univariate endemic and epidemic timeseries, i.e., with and without outbreak.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there maybe a review paper that backs some of the claims made here that we could reference, as in the outbreak detection section?


Endemic Timeseries
------------------
An endemic timeseries is an timeseries of case counts (as defined in :ref:`outbreak-detection-formalization`.) that occurs naturally and varies between diseases, time, space, sex, age, and other dimensions. The important distinction is that there is no influence in the observed case counts due to an outbreak event.

A simulation is usually a time-dependent, linear model that produces realistic case counts and can be set to mimic different types of disease dynamics. To achieve realism in epidemiological simulations, you would usually incorporate different effects into that model such as seasonality and trend e.g., a repetition of a certain pattern after one year with increasing case numbers over time. Finally, some distribution is used to make the outcome of the model non-deterministic. Since case counts are whole numbers, a Poisson or negative binomial distribution is used on top of the linear model to introduce some randomness in the observed case counts. These kinds of algorithms are referred to as ``seasonal_noise``.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A simulation is usually a time-dependent, linear model that produces realistic case counts and can be set to mimic different types of disease dynamics. To achieve realism in epidemiological simulations, you would usually incorporate different effects into that model such as seasonality and trend e.g., a repetition of a certain pattern after one year with increasing case numbers over time. Finally, some distribution is used to make the outcome of the model non-deterministic. Since case counts are whole numbers, a Poisson or negative binomial distribution is used on top of the linear model to introduce some randomness in the observed case counts. These kinds of algorithms are referred to as ``seasonal_noise``.
A simulation is usually a time-dependent, linear model that produces realistic case counts and can be set to mimic different types of disease dynamics. To achieve realism in epidemiological simulations, you would usually incorporate different effects into that model such as seasonality and trend e.g., a repetition of a certain pattern after one year with increasing case numbers over time. Finally, some distribution is used to make the outcome of the model non-deterministic. Since case counts are positive integers, a Poisson or negative binomial distribution is used on top of the linear model to introduce some randomness in the observed case counts. These kinds of algorithms are referred to as ``seasonal_noise``.


Epidemic Timeseries
-------------------
Once we have created a model to simulate an endemic timeseries, we can introduce outbreak events by randomly increasing the case count at certain timepoints :math:`t`. A common approach is to have a chain of switching states (usually produces by a Markov chain) that use senseful transition probabilities to move into or leave the state of an outbreak. Alternatively, one can assign certain timepoints to be in an outbreak to make such timeseries more comparable or tests edge cases of outbreak detection algorithms.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once we have created a model to simulate an endemic timeseries, we can introduce outbreak events by randomly increasing the case count at certain timepoints :math:`t`. A common approach is to have a chain of switching states (usually produces by a Markov chain) that use senseful transition probabilities to move into or leave the state of an outbreak. Alternatively, one can assign certain timepoints to be in an outbreak to make such timeseries more comparable or tests edge cases of outbreak detection algorithms.
Once we have created a model to simulate an endemic timeseries, we can introduce outbreak events by randomly increasing the case count at certain timepoints :math:`t`. A common approach is to have a chain of switching states (usually produces by a Markov chain) that use sensible transition probabilities to move into or leave the state of an outbreak. Alternatively, one can assign certain timepoints to be in an outbreak to make such timeseries more comparable or tests edge cases of outbreak detection algorithms.


In practice, the model for the simulation of endemic timeseries is extended by a term that is dependent on the current outbreak state. If there is not outbreak, the term is ignored otherwise a fixed term is added to the endemic case counts.
17 changes: 17 additions & 0 deletions docs/3_user_guide.rst → docs/4_user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,20 @@ for a more detailed discussion. Therefore, bot ``fit`` and
``Iterable[Tuple[DataFrame, bool]]``. The label indicates whether
the last time point of the time series is to be considered an outbreak.
The ``predict`` method in this case only returns a time series of alarms.

Simulating Epidemiological Data
-------------------------------
Epysurv provides the methods to simulate endemic timeseries, using
the ``SeasonalNoisePoisson`` and ``SeasonalNoiseNegativeBinomial`` and
epidemic timeseries, using the ``PointSource`` class. All simulations
can be tuned to simulate different seasonality, trends, and other
characteristics during instantiation. Every simulation needs to implement
the ``simulate`` method that at least takes a `length` parameter that
determines how many observation should be simulated. Additionally, if the
timeseries is supposed to be epidemic, we can define the ``state``, i.e., a sequence of
equal length to the amount of simulations that encodes outbreaks. A ``1`` in the
``state`` sequence indicates an outbreak and ``0`` otherwise. This is also shown in the
`quick tour <demo.ipynb>`_. Optionally, we can run a
Markov chain to randomly generate states where its transition probabilities can be adjusted..


2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# -- Project information -----------------------------------------------------

project = "epysurv"
copyright = "2019, Rüdiger Busche"
copyright = "2020, Rüdiger Busche"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add yourself to the authors ;)

author = "Rüdiger Busche and Justin Shenk"


Expand Down
422 changes: 335 additions & 87 deletions docs/demo.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ This package was originally developed at the Robert Koch Institute in the `Signa

1_quickstart.rst
2_outbreak_detection.rst
3_user_guide.rst
3_data_simulation.rst
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there could be a better name for this, as we are not simulating the data, but the "epidemic", "outbreak", ... . Maybe "Simulating epidemiological dynamics" or just "Simulation"? Is there maybe an accepted term for this in the community?

4_user_guide.rst
api_doc/modules.rst


Expand Down