Dynamic Survival Analysis

This repository provides a Python implementation of the dynamic survival analysis method. A brief description of the methodology can be found in this white paper. Prof. Greg Rempała gave a couple of public talks on this model. You can watch his MBI seminar talks here: link to his first talk and link to his second talk.

This is primarily based on a package (available here) developed by Caleb Deen Bastian, Princeton University. I also acknowledge Saket Gurukar, who helped with the parallelization of some of the routines.

If you have questions, comments, criticisms, or corrections, please email me at khudabukhsh.2@osu.edu.

Installation

Please make sure you have Python (version 3.6.x and above). If you do not have Python, we recommend installing it from Anaconda (link here).
You can download our package either by hitting download or by cloning our repository. Cloning can be done by running the following command

git clone https://github.com/wasiur/dynamic_survival_analysis.git

from your terminal. 3. Our implementation depends on a number of packages. In order for the parallelization to run smoothly, we recommend installing the following python environment "dynamic_survival_analysis". This is included in the file environment.yml. If you are using Anaconda (recommended), the environment can be installed by running

conda env create -f environment.yml

In order to check if the environment is now available, run

conda env list

Activate the environment "dynamic_survival_analysis" by running

conda activate dynamic_survival_analysis

or

source activate dynamic_survival_analysis

from your terminal.

Data preparation

A typical input data to the model should have following seven columns:

time	daily_confirm	recovery	deaths	cum_confirm	cum_heal	cum_dead
2020-03-01	2	0	0	2	0	0
2020-03-02	8	1	0	10	1	0
.	.	.	.	.	.	.
.	.	.	.	.	.	.
.	.	.	.	.	.	.
2020-06-05	46	13	21	63291	1200	1037

At least one of daily_confirm and cum_confirm must be present. If the parameters corresponding to the recovery distribution need to be estimated, at least of the four recovery, deaths, cum_heal, and cum_dead must be present in the dataset.

If no recovery information is available, the model can be still run by explicitly providing the -r option.

We used COVID-19 data published by the New York Times to inform our model. The repository can be accessed here.

Running the dynamic survival analysis model

The python scripts allow a number of options. The most important option is -d , which is used to pass the name of the data file to the python script. If no dataset is present, the model can be run on dummy data by providing the -v or --verbose option, which makes the script enter a verbose mode. If neither -d nor -v is provided, the script will throw an error.

Fore more details on the options provided, run python DSA.py -h or python DSA.py --help. For instance, a run of python DSA_Bayesian.py -h yields

Usage: python DSA_Bayesian.py -d <datafile>

Options:
  -h, --help            show this help message and exit
  -d DATAFILE, --data-file=DATAFILE
                        Name of the data file.
  -l LOCATION, --location=LOCATION
                        Name of the location.
  -m, --mpi             Indicates whether to use MPI for parallelization.
  -o OUTPUT_FOLDER, --output-folder=OUTPUT_FOLDER
                        Name of the output folder
  -s, --smooth          Indicates whether the daily counts should be smoothed.
  -f LAST_DATE, --final-date=LAST_DATE
                        Last day of data to be used
  -r, --estimate-recovery-parameters
                        Indicates the parameters of the recovery distribution
                        will be estimated
  -N N                  Size of the random sample
  -T T, --T=T           End of observation time
  --day-zero=DAY0       Date of onset of the epidemic
  --niter=NITER         Number of iterations of the MCMC
  --threads=THREADS     Number of threads for MPI
  -v, --verbose         Runs with default choices

The easiest way to run our model is to open one of the Jupyter notebooks and run the cells. Please modify the commands as needed.

Alternatively, perform the following:

(Recommended) The Bayesian model can be run by invoking

python DSA_Bayesian.py -d <datafile>

from the terminal.

The maximum likelihood based DSA model can be run by invoking

python DSA.py -d <datafile>

from the terminal.

The semi-Bayesian Laplace approximation to the posterior distribution of the parameters can be carried out by running the following command

python DSA_Laplace.py -d <datafile>

from the terminal.

Examples

We provide two examples.

The first example extracts count data from a repository maintained by the New York Times. This example fits the Bayesian DSA model.
The second example works on a dummy data set and runs the basic DSA model.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cluster		cluster
data		data
plots		plots
.gitignore		.gitignore
DSA.py		DSA.py
DSA.stan		DSA.stan
DSA_Bayesian.py		DSA_Bayesian.py
DSA_Laplace.py		DSA_Laplace.py
DSA_underdispersed_PT.stan		DSA_underdispersed_PT.stan
Italy.ipynb		Italy.ipynb
README.md		README.md
README2.md		README2.md
__init__.py		__init__.py
bayesian.sh		bayesian.sh
dsa.ipynb		dsa.ipynb
dsa.yml		dsa.yml
dsa_laplace.ipynb		dsa_laplace.ipynb
dsa_mh.ipynb		dsa_mh.ipynb
dsa_parallel_fitting.py		dsa_parallel_fitting.py
dsacore.py		dsacore.py
environment.yml		environment.yml
epidemiccore_w.py		epidemiccore_w.py
estimate_gamma_parallel.py		estimate_gamma_parallel.py
example1.py		example1.py
example2.py		example2.py
mac_environment.yml		mac_environment.yml
my_mh.py		my_mh.py
mycolours.py		mycolours.py
parallel_Laplace.py		parallel_Laplace.py
parallel_epidemic.py		parallel_epidemic.py
parallel_epidemic_laplace.py		parallel_epidemic_laplace.py
parallel_mh.py		parallel_mh.py
tudColours.py		tudColours.py

wasiur/dynamic_survival_analysis

Folders and files

Latest commit

History

Repository files navigation

Dynamic Survival Analysis

Installation

Data preparation

Running the dynamic survival analysis model

Examples

About

Resources

Stars

Watchers

Forks

Languages