Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

This repository stores code and data associated with the manuscript Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys by Daniel B. Larremore, Bailey K. Fosdick, Kate M. Bubar, Sam Zhang, Stephen M. Kissler, C. Jessica E. Metcalf, Caroline O. Buckee, and Yonatan H. Grad.

This repository is maintained by Daniel Larremore. Questions can be directed to daniel.larremore@colorado.edu.

Code is written primarily in python but the MCMC engine has been rewritten in R because it is, for some reason, faster.

Inference of prevalence (one population)

When sensitivity and specificity are known, one can use those values, along with the number of positive and negative test results, to produce posterior estimates of prevalence like this one:

Interactive web tool option

This figure can be created and downloaded by using the web-based calculator available at https://larremorelab.github.io/covid-serology.

Python option

To perform the same inference in python, a notebook is available at codebase/prevalence_onepopulation_workbook.ipynb

Inference of prevalence (multiple subpopulations)

When there are multiple subpopulations, it is tempting to simply use the estimation approach above, independently for each subpopulation. However, especially when the number of samples per bin is small, this is not a good idea, as the estimates may vary wildly. What can be done?

What we do in the paper, which we provide code for here, is to use a Bayesian hierarchical model where the supopulation seroprevalences share a prior distribution. The mean of that prior is informed entirely by the data, while the variance of that prior is weakly specified by a hyperprior. In plain English, the Bayesian approach allows the subpopulations to learn from each other without entraining to the same estimates.

The notebook codebase/prevalence_subpopulations_workbook.ipynb gives example code. Given data in the form of sensitivity, specificity, and the number of positive and negative samples in each subpopulation, the inference is performed to generate posterior estimates. Note that we are able to produce estimates albeit with high uncertainty for the supopulation that was dramatically undersampled.

Sample size calculator (one population)

For the moment, the sample size calculation has not been wrapped in its own function to allow one to "solve for n". However, if one wanted to compute the number of samples required to produce estimates to within a certain uncertainty, that number would depend on

Test sensitivity and specificity.
The range of prevalence values that are plausibly expected.
The posterior uncertainty spec that is trying to be achieved.

As a general rule, one could take the plausible prevalence values, choose the one closest to 0.5, and assume that the true seroprevalence is that "worst case" value. Then, use the notebook codebase/prevalence_onepopulation_workbook.ipynb to explore how sample counts affect posterior error (measured by credible interval width).

Model & Demographic Informed Sample Allocation

MDI stands for Model & Demographic Informed. It is a variance reduction strategy. In plain language, its goal is to help allocate more samples to those subpopulations whose accurate measurement will be most beneficial. The method is explained in the manuscript, and it produces recommendations for sample allocation. Here is an example of a hypothetical sample allocation strategy with the goal of parameterizing a model of disease dynamics in Argentina (using estimates of Argentina's demographic and contact structure). This figure and others like it can be recreated and customized using the Jupyte notebookcodebase/sample_allocation_MDI_workbook.ipynb

SEIR Simulations

In the paper, we have a figure like this one:

This figure can be recreated from scratch using the Jupyter notebook codebase/SEIR_workbook.ipynb. In order, this notebook:

Simulates hypothetical data. (Real data could be inserted here.)
Infers the posterior distribution over seroprevalence using the sensitivity and specificity values chosen.
Runs an SEIR simulation forward. (Initial conditions, parameters, and assumptions about immunity could be adjusted here.)
Calculates both peak timing and height distributions. (only Height is plotted, but this can be adjusted in plots.)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
codebase		codebase
contact_matrices_premetal2017		contact_matrices_premetal2017
images		images
manuscript		manuscript
population_data_UNWPP		population_data_UNWPP
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codebase

codebase

contact_matrices_premetal2017

contact_matrices_premetal2017

images

images

manuscript

manuscript

population_data_UNWPP

population_data_UNWPP

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Inference of prevalence (one population)

Interactive web tool option

Python option

Inference of prevalence (multiple subpopulations)

Sample size calculator (one population)

Model & Demographic Informed Sample Allocation

SEIR Simulations

About

Releases

Packages

Contributors 2

Languages

License

LarremoreLab/covid_serological_sampling

Folders and files

Latest commit

History

Repository files navigation

Estimating SARS-CoV-2 seroprevalence and epidemiological parameters with uncertainty from serological surveys

Inference of prevalence (one population)

Interactive web tool option

Python option

Inference of prevalence (multiple subpopulations)

Sample size calculator (one population)

Model & Demographic Informed Sample Allocation

SEIR Simulations

About

Resources

License

Stars

Watchers

Forks

Languages