Skip to content

We apply combinatorial mutagenesis and next-generation sequencing to characterize the local fitness landscape in an antigenic region of NA in six different human H3N2 strains that were isolated around 10 years apart. This pipeline is used to analyze data and plotting.

License

Notifications You must be signed in to change notification settings

Wangyiquan95/NA_EPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This README describes the analyses in:
Wang et al. Antigenic evolution of human influenza H3N2 neuraminidase is constrained by charge balancing. eLife 10:e72516 (2021)

ANALYSIS FOR H3N2 NA ANTIGENIC region of interest by DEEP MUTATIONAL SCANNING

This study aims to understand how epistasis influence NA antigenic evolution and characterize the underlying biophysical constraints. The repository here describes the analysis for the deep mutational scanning experiment that focuses on NA residues 328, 329, 344, 367, 368, 369, 370 in six different genetic backgrounds, namely A/Hong Kong/1/1968 (HK68), A/Bangkok/1/1979 (Bk79), A/Beijing/353/1989 (Bei89), A/Moscow/10/1999 (Mos99),A/Victoria/361/2011 (Vic11), and A/Hong Kong/2671/2019 (HK19).

REQUIREMENTS

INPUT FILE

ANALYSIS PIPELINE

Fitness landscape libraries analysis

  1. ./script/fastq_to_fitness.py: Converts raw reads to variant counts and fitness measures.
  2. ./script/complie_fit_result.py: Complie variants info(amino acid;charge;fitness) in six different genetic background
  3. ./script/NAEpi_PrefEvol.py: Amino acid sequences of NA antigenic region of interest in naturally occurring strains were extracted

Inference of additive fitness and pairwise epistasis

  1. ./script/GE_regression.ipynb: Model training and robustness validation
  2. ./script/GE_regression_v2.ipynb: Cross-validation and regularization
  3. ./script/Distance_CA.py: Calculate C alpha-alpha Distance within NA antigenic region

Natural strains evolution analysis

  1. ./script/ExtractSweep.py: Calculate all variant frequencies of each residue of H3N2 NA over year
  2. ./script/analyze_charge_natural_strain.py: Analyze antigenic region local charge in natural circulating strains
  3. ./script/mut_freq_ByYear.py: Analyze HA and NA accumulating mutation since 1968
  4. ./script/Coevolution_analysis_NA.ipynb: Analyze NA antigenic region charge state coevolution

Construction of evolutionary trajectories based on the fitness data

  1. ./script/Evolution_model.py: Anlyze the evolutionary trajectory based on the fitness data

PLOT

Natural evolution of an antigenic region in human H3N2 NA (Figure 1)

  1. ./script/Plot_Mutation_year.R: plot HA and NA accumulating mutation since 1968 (Supplementary Fig. 1)
  2. ./script/NA_epi.pml: plot the NA head domain (Fig. 1a)
  3. ./script/NA_epi_zoom.pml: plot the antigenic region of interest (Fig. 1b)
  4. ./script/TrackAAFreq.R:plot natural occurrence frequencies of the amino acid variants/charge state (Fig. 1c and Supplementary Fig. 13)

Comparing the local fitness landscapes of the NA antigenic region (Figure 2)

  1. ./script/Plot_CompareLib.R: plot fitness distribution and correlations of different background (Fig. 2a and b)
  2. ./script/Plot_Nat_motif_Freq.R:plot naturally occurring variant frequencies over year (Supplementary Fig. 2)
  3. ./script/Plot_CompareRep.R: plot the biological repeat correlation (Supplementary Fig. 3)
  4. ./script/Plot_TrackPref.R: plot naturally occurring variants fitness over year(Fig. 2c)
  5. ./script/Plot_NA_titer.R: plot virus rescue experiment of WT strains(Supplementary Fig. 4)

Inference of additive fitness and pairwise epistasis (Figure 3)

  1. ./script/Plot_hyperpar_R2.R: plot evaluation of model hyperparameters using repeated k-fold cross-validation (Supplementary Fig. 5&(./graph/hyperpar_r.png))
  2. ./script/Plot_add_heatmap.R: plot parameters for additive fitness in different genetic backgrounds (Fig. 3a)
  3. ./script/Plot_epi_heatmap.R: plot pairwise epistasis heatmap and epistasis classified by charge states (Fig. 3b and Supplementary Fig. 8; Fig. 4a and Supplementary Fig. 9)
  4. ./script/Plot_CorEPI.R:plot correlation matrices of additive fitness and pairwise epistasis among six genetic backgrounds (Fig. 3c and Supplementary Fig. 6-7)

The importance of local net charge in the NA antigenic region (Figure 4)

  1. ./script/NA_epi_bind.pml: plot the NA antigenic region interaction (Fig. 4b and Supplementary Fig. 10)
  2. ./script/Plot_charge_vs_fit.R: Plot variant fitness with local net charge (Fig. 4c and Supplementary Fig. 12)
  3. ./script/Plot_Distance_vs_EPI: plot Cα-Cα distances and epistasis

Predicting coevolution of charge states in the NA antigenic region using epistasis (Figure 5)

  1. ./script/plot_charge_natural_strain.R: plot the evolution of local net charge at the NA antigenic region (Fig. 5a)
  2. ./script/Plot_epi_heatmap_bycharge.R: plot pairwise epistasis of charge states (Fig. 5b and Supplementary Figure 14)
  3. ./script/Plot_epi_vs_Coevol.R:plot relationship between coevolution score and pairwise epistasis (Fig. 5c and Supplementary Fig. 16)

About

We apply combinatorial mutagenesis and next-generation sequencing to characterize the local fitness landscape in an antigenic region of NA in six different human H3N2 strains that were isolated around 10 years apart. This pipeline is used to analyze data and plotting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages