Fine-scale spatial and social patterns of SARS-CoV-2 transmission from identical pathogen sequences

Cécile Tran-Kiem¹, Miguel Paredes^1,2, Amanda Perofsky^3,4, Lauren Frisbie⁵, Hong Xie⁶, Kevin Kong⁶, Amanda Weixler⁶, Alexander Geninger^1,6, Pavitra Roychoudhury^1,6, JohnAric Peterson⁵, Andrew Delgado⁵, Holly Halstead⁵, Drew MacKellar⁵, Philip Dykema⁵, Luis Gamboa³, Chris Frazar⁷, Erica Ryke⁷, Jeremy Stone³, David Reinhart³, Lea Starita^3,7, Allison Thibodeau⁵, Cory Yun⁵, Frank Aragona⁵, Allison Black⁵, Cécile Viboud⁴, Trevor Bedford ^1,8.

¹ Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
² Department of Epidemiology, University of Washington, Seattle, WA, USA
³ Brotman Baty Institute, University of Washington, Seattle, WA, USA
⁴ Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
⁵ Washington State Department of Health, Shoreline, WA, USA
⁶ Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
⁷ Department of Genome Sciences, University of Washington, Seattle, WA, USA
⁸ Howard Hughes Medical Institute, Seattle, WA, USA

Abstract

Install

The code is written in R and relies on some packages, which can be installed using:

Rscript ./scripts/install_requirements.R "scripts/requirements.txt"

Computing relative risks of observing sequence at a defined genetic distance in two subgroups from user data

To facilitate the application of this method to other datasets, we provide the code developped to compute the relative risk of observing sequences at a defined genetic distance between different subgroups. We illustrate how this may be done starting from an arbitrary FASTA alignment and csv metadata file.

cd scripts/

## Generate the relative risk of observing sequences at a specified genetic distance. It takes the following arguments:
# --input-fasta: file path to the user-defined FASTA alignment
# --input-metadata: file path to the user-defined metadata file. The metadata should be a csv file with a column "sequence_name" containing the sequence names (matching those found in the alignment) and some associated metadata columns. 
# --name-group: name of the column in the metadata file denoting the groups between which we will generate the relative risk of observing sequences at a given genetic distance.
# --n-mut-away: genetic distance between sequences. 
# --output-file-csv: file path to save the dataframe with the RR of observing sequences

# --compute-subsample-CI: boolean (1 or 0) indicating whether to compute subsampled confidence interval around RR estimates. If not specified, default is 0. 
# --n-subsamples: number of draws used to compute CI. If not specified, default is 1000.
# --prop-subsample: proportion of . If not specified, default is 0.8. 

Rscript ./get_RR_from_fasta.R \
    --input-fasta="../data/synthetic_data/synthetic-fasta.fasta" \
    --input-metadata="../data/synthetic_data/synthetic-metadata.csv" \
    --name-group="group" \
    --n-mut-away=0 \
    --output-file-csv="../results/df_RR.csv" \
    --compute-subsample-CI=1 \
    --n-subsamples=100 \
    --prop-subsample=0.8

Overview

This repository contains code and data associated with the above manuscript.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
figures		figures
results		results
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

figures

figures

results

results

scripts

scripts

README.md

README.md

Repository files navigation

Fine-scale spatial and social patterns of SARS-CoV-2 transmission from identical pathogen sequences

Abstract

Install

Computing relative risks of observing sequence at a defined genetic distance in two subgroups from user data

Overview

About

Releases

Packages

Languages

blab/phylo-kernel-public

Folders and files

Latest commit

History

Repository files navigation

Fine-scale spatial and social patterns of SARS-CoV-2 transmission from identical pathogen sequences

Abstract

Install

Computing relative risks of observing sequence at a defined genetic distance in two subgroups from user data

Overview

About

Resources

Stars

Watchers

Forks

Languages