Skip to content

Scripts and data associated with "Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod"

License

Notifications You must be signed in to change notification settings

TheDBStern/Baltic_Lab_Wild

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Introduction

This repository contains analysis scripts and data associated with our manuscript

Stern DB, Anderson NW, Diaz JA, and CE Lee. Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod

Usage

Scripts are organized by snp_calling, selection_analyses and simulations
Command-line options for python scripts can be found, e.g.,baypass2freqs_cov.py -h

snp_calling, -- reference assembly, SNP calling, SNP data processing

  • assemble_poolseq.commands.txt Commands used to generate the 'pseudoreference' genome by 'tiling' Pool-seq data onto the transcriptome in an iterative mapping and assembly approach
  • baypass2freqs_cov.py Converts a file from multipopulation BayPass format (refcount1 altcount1 etc.) to frequencies of the alt allele and a coverage matrix
  • bams2SNPs.commands.sh Commands used to call SNPs and generate allele count files
  • calculate_coverage_distribution_sync.py Calculates the top X percentage of coverage across all pools from a sync file
  • filter_fasta_by_blast.py Filters a multifasta file based on whether sequences had a significant blast hit to some sequence database or genome
  • filter_sync_by_snplist.py Filters a sync file (Popoolation2) by a list of SNPs to keep (e.g. a snpdet file produced by poolfstat)
  • get_mates.py For a set of left/R1 reads, fetch corresponding right/R2 read pairs
  • get_SNP_position_in_genome.py Convert SNP positions called in one reference genome to approximate position in another genome based on blast results
  • vcf2genobaypass.R R commands to generate the read count file from the VarScan VCF using poolfstat

selection_analyses, -- CMH, Chi-square, & LMM tests, calculating Jaccard index

  • ACER_code.R R commands used to run the Chi-square and CMH tests on SNPs
  • determine_AFC_cutoff.R R commands to simulate neutral allele frequency change to determine a cutoff to call an allele an under selection in a given line
  • parallelism_functions.R R functions to calculate the Jaccard index and RFS for the empirical data
  • prep_lmm.R R code specific to this study for generating the input file to run the lmm analysis of SNP frequency trajectories. Uses the files in the data directory
    • 'prep_lmm.rawAFC.R' - same as above but does not transform the allele frequencies
    • 'prep_lmm.rawFreqs.R' - same as above but uses raw allele frequencies rather than divergence from the ancestor
  • run_lmm.R R script to run the linear mixed model with lme4 on every called SNP. Uses the output from prep_lmm.R

simulations, -- SLiM script and commands for running epistasis simulations using our empirical parameters

  • epistasis_simulations.slim -- SLiM script to run the simulations. Contains the fitness functions used in the study.
  • run_slim.sh -- Command to execute the SLiM script. Parameter values are set in the command line.
  • sortedhbdata.csv -- Data from the 121 selected alleles (haplotype blocks) used in the simulations.

Additional information and simulation scenarios can be found here

Software required to run these scripts

Python packages

Python version 3.8.2

R packages

R version 4.0.4

Other software used in the manuscript

Data

SNPs and allele counts derived from the Pool-seq data are available in the data directory. Please see the README file within for information.

Please contact the authors for questions or issues.

About

Scripts and data associated with "Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod"

Topics

Resources

License

Stars

Watchers

Forks