Skip to content

bartonlab/paper-binary-trait-inference

Repository files navigation

Overview

This repository contains data and scripts for reproducing the results accompanying the manuscript

Inferring selection for HIV-1 escape from T cell responses using a binary trait model

Yirui Gao1, and John P. Barton2,3,#

1 Department of Physics and Astronomy, University of California, Riverside
2 Department of Physics and Astronomy, University of Pittsburgh
3 Department of Computational and Systems Biology, University of Pittsburgh School of Medicine
# correspondence to jpbarton@pitt.edu

Contents

Scripts for generating and analyzing the simulation data can be found in the Simulation_analyze.ipynb notebook, and references therein. Scripts for processing and analyzing the HIV-1 data are contained in the HIV_analyze.ipynb notebook. Finally, scripts for analysis and figures contained in the manuscript are located in the figures.ipynb notebook.

Due to the large size and number of some files generated by simulations and by the interim analysis of HIV-1 data, some data has been stored in a compressed format using Zenodo. To access the full set of data, navigate to the Zenodo record. Then download and extract the contents of the archives HIV-output.tar.gz, HIV-input-seq.tar.gz, and sim-jobs.tar.gz into the folders data/HIV/output, data/HIV/input/sequence, and data/simulation/jobs respectively.

Running MPL

This repository includes code for inferring selection coefficients and trait coefficients using the marginal path likelihood (MPL) method. Code implementing MPL in C++ is located in the src/MPL directory.

HIV data

Here we combine HIV sequence data from the Los Alamos National Laboratory HIV Sequence Database and immunological data to investigate HIV evolution across 13 individuals. This data is contained in the data/HIV/ directory.

And we use some processed data from here.

  • epitopes.csv $\to$ epitopes.csv: information about epitopes
  • src-MPL-HIV.tar.gz/*-poly-seq2state.dat $\to$ input/sequence/*-poly-seq2state.dat: processed sequences readable by MPL
  • src-MPL-HIV.tar.gz/Zanini-extended $\to$ input/Zanini-extended: mutation matrix for HIV data
  • processed/*-index.csv $\to$ notrait/processed/*-index.csv: information about all sites
  • interim/*-poly.csv $\to$ notrait/interim/*-poly.csv: information about polymorphic sites, which will be modified to interim/*-poly.csv when finding escape sites

Software dependencies

Here's an example statement about the need for external software to execute any part of the code: Parts of the analysis are implemented in C++11 and the GNU Scientific Library.

License

This repository is dual licensed as GPL-3.0 (source code) and CC0 1.0 (figures, documentation, and our presentation of the data).

About

No description, website, or topics provided.

Resources

License

CC0-1.0, GPL-3.0 licenses found

Licenses found

CC0-1.0
LICENSE-CC0
GPL-3.0
LICENSE-GPL

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published