Skip to content

wasiur/dynamic_survival_analysis

Repository files navigation

Dynamic Survival Analysis

This repository provides a Python implementation of the dynamic survival analysis method. A brief description of the methodology can be found in this white paper. Prof. Greg Rempała gave a couple of public talks on this model. You can watch his MBI seminar talks here: link to his first talk and link to his second talk.

This is primarily based on a package (available here) developed by Caleb Deen Bastian, Princeton University. I also acknowledge Saket Gurukar, who helped with the parallelization of some of the routines.

If you have questions, comments, criticisms, or corrections, please email me at khudabukhsh.2@osu.edu.

Installation

  1. Please make sure you have Python (version 3.6.x and above). If you do not have Python, we recommend installing it from Anaconda (link here).
  2. You can download our package either by hitting download or by cloning our repository. Cloning can be done by running the following command
git clone https://github.com/wasiur/dynamic_survival_analysis.git

from your terminal. 3. Our implementation depends on a number of packages. In order for the parallelization to run smoothly, we recommend installing the following python environment "dynamic_survival_analysis". This is included in the file environment.yml. If you are using Anaconda (recommended), the environment can be installed by running

conda env create -f environment.yml

In order to check if the environment is now available, run

conda env list
  1. Activate the environment "dynamic_survival_analysis" by running
conda activate dynamic_survival_analysis

or

source activate dynamic_survival_analysis

from your terminal.

Data preparation

A typical input data to the model should have following seven columns:

time daily_confirm recovery deaths cum_confirm cum_heal cum_dead
2020-03-01 2 0 0 2 0 0
2020-03-02 8 1 0 10 1 0
. . . . . . .
. . . . . . .
. . . . . . .
2020-06-05 46 13 21 63291 1200 1037

At least one of daily_confirm and cum_confirm must be present. If the parameters corresponding to the recovery distribution need to be estimated, at least of the four recovery, deaths, cum_heal, and cum_dead must be present in the dataset.

If no recovery information is available, the model can be still run by explicitly providing the -r option.

We used COVID-19 data published by the New York Times to inform our model. The repository can be accessed here.

Running the dynamic survival analysis model

The python scripts allow a number of options. The most important option is -d , which is used to pass the name of the data file to the python script. If no dataset is present, the model can be run on dummy data by providing the -v or --verbose option, which makes the script enter a verbose mode. If neither -d nor -v is provided, the script will throw an error.

Fore more details on the options provided, run python DSA.py -h or python DSA.py --help. For instance, a run of python DSA_Bayesian.py -h yields

Usage: python DSA_Bayesian.py -d <datafile>

Options:
  -h, --help            show this help message and exit
  -d DATAFILE, --data-file=DATAFILE
                        Name of the data file.
  -l LOCATION, --location=LOCATION
                        Name of the location.
  -m, --mpi             Indicates whether to use MPI for parallelization.
  -o OUTPUT_FOLDER, --output-folder=OUTPUT_FOLDER
                        Name of the output folder
  -s, --smooth          Indicates whether the daily counts should be smoothed.
  -f LAST_DATE, --final-date=LAST_DATE
                        Last day of data to be used
  -r, --estimate-recovery-parameters
                        Indicates the parameters of the recovery distribution
                        will be estimated
  -N N                  Size of the random sample
  -T T, --T=T           End of observation time
  --day-zero=DAY0       Date of onset of the epidemic
  --niter=NITER         Number of iterations of the MCMC
  --threads=THREADS     Number of threads for MPI
  -v, --verbose         Runs with default choices

The easiest way to run our model is to open one of the Jupyter notebooks and run the cells. Please modify the commands as needed.

Alternatively, perform the following:

  1. (Recommended) The Bayesian model can be run by invoking
python DSA_Bayesian.py -d <datafile>

from the terminal.

  1. The maximum likelihood based DSA model can be run by invoking
python DSA.py -d <datafile>

from the terminal.

  1. The semi-Bayesian Laplace approximation to the posterior distribution of the parameters can be carried out by running the following command
python DSA_Laplace.py -d <datafile>

from the terminal.

Examples

We provide two examples.

  1. The first example extracts count data from a repository maintained by the New York Times. This example fits the Bayesian DSA model.

  2. The second example works on a dummy data set and runs the basic DSA model.

About

This repository provides a Python implementation of the dynamical survival analysis method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published