EVEscape

This is the official code repository for the paper "Learning from pre-pandemic data to forecast viral antibody escape". This paper is a joint collaboration between the Marks Lab and the OATML group.

Overview

EVEscape is a model that computes the predicted likelihood of a given viral protein variant to induce immune escape from antibodies. For each protein, EVEscape predicts escape from data sources available pre-pandemic: sequence likelihood predictions from broader viral evolution, antibody accessibility information from protein structures, and changes in binding interaction propensity from residue chemical properties.

Usage

Computing EVEscape scores consists of three components:

Fitness: use scores from EVE, an unsupervised generative model of mutation effect from broader evolutionary sequences
Accessibility: calculate WCN from PDB structures of relevant conformations of the viral protein of interest
Dissimilarity: calculate difference in charge and hydrophobicity between the mutant residue and the wildtype

The components are then standardized and fed into a temperature scaled logistic function, and we take the the log transform of the product of the 3 terms to obtain final EVEscape scores.

We also provide EVEscape scores for all single mutation variants of SARS-CoV-2 Spike and aggregate strain-level predictions for all PANGO lineages in our paper, and EVEscape rankings of newly occurring GISAID strains and visualization of likely future mutations will be available at evescape.org.

Scripts

The scripts folder contains python scripts to calculate EVEscape scores for all single mutations and aggregate deep mutational scanning data for SARS-CoV-2 RBD, Flu HA, and HIV Env from data. Specifically this includes the following two scripts:

process_protein_data.py calculates the three EVEscape components
evescape_scores.py creates the final evescape scores and outputs scores and processed DMS data in summaries_with_scores

The workflow of the scripts to create the data tables in results needed for the main figures of the EVEscape paper is available in evescape_summary.pdf. Additional data tables are available in the paper supplement.

Data requirements

The data required to obtain EVEscape scores is one or multiple PDB files, EVE scores (see next subsection) and a fasta file of the wildtype sequence for the viral protein of interest.

To download the RBD escape data used in this project (~120MB unzipped):

curl -o escape_dms_data_20220109.zip https://marks.hms.harvard.edu/evescape/escape_dms_data_20220109.zip
unzip escape_dms_data_20220109.zip
rm escape_dms_data_20220109.zip

(originally downloaded from SARS2_RBD_Ab_escape_maps)

Generating EVE scores

We leverage the original EVE codebase to compute the evolutionary indices used in EVEscape.

Model training

The MSAs used to train the EVE models used in this project can be found in the supplemental material of the paper (Data S1).

We modify the Bayesian VAE training script to support the following hyperparameter choices in the MSA_processing call:

sequence re-weighting in MSA (theta): we choose a value of 0.01 that is better suited to viruses (Hopf et al., Riesselman et al.)
fragment filtering (threshold_sequence_frac_gaps): we keep sequences in the MSA that align to at least 50% of the target sequence.
position filtering (threshold_focus_cols_frac_gaps): we keep columns with at least 70% coverage, except for SARS-CoV-2 Spike for which we lower the required value to 30% in order to maximally cover experimental positions and significant pandemic sites.

We train 5 independent models with different random seeds.

Model scoring

For the 5 independently-trained models, we compute evolutionary indices sampling 20k times from the approximate posterior distribution (ie., num_samples_compute_evol_indices=20000). We then average the resulting scores across the 5 models to obtain the final EVE scores used in EVEscape.

License

This project is available under the MIT license.

Reference

If you use this code, please cite the following paper:

Nicole N. Thadani*, Sarah Gurev*, Pascal Notin*, Noor Youssef, Nathan J. Rollins, Chris Sander, Yarin Gal, Debora S. Marks. Learning from pre-pandemic data to forecast viral antibody escape. BioRxiv. 2022.

(* equal contribution)

Links:

Pre-print: https://www.biorxiv.org/content/10.1101/2022.07.21.501023v1
Website: https://www.evescape.org/

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
results		results
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
acknowledgements.md		acknowledgements.md
evescape_summary.pdf		evescape_summary.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

results

results

scripts

scripts

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

acknowledgements.md

acknowledgements.md

evescape_summary.pdf

evescape_summary.pdf

Repository files navigation

EVEscape

Overview

Usage

Scripts

Data requirements

Generating EVE scores

Model training

Model scoring

License

Reference

About

Releases

Packages

Languages

License

debbiemarkslab/EVEscape

Folders and files

Latest commit

History

Repository files navigation

EVEscape

Overview

Usage

Scripts

Data requirements

Generating EVE scores

Model training

Model scoring

License

Reference

About

Resources

License

Stars

Watchers

Forks

Languages