Skip to content

Latest commit

 

History

History
53 lines (35 loc) · 1.8 KB

README.md

File metadata and controls

53 lines (35 loc) · 1.8 KB

SNPCall_Benchmarking

Benchmarking variant callers on simulated shotgun metagenomic data. Implementing a bioinformatic pipeline from synthesizing reads to alignment and variant calling.

Methods & Workflow



Figure 1. Workflow diagram showing the variant caller benchmarking process. First, select RefSeq genomes were chosen to simulate a metagenome and random mutations were added to the genomes to create a "gold standard" dataset. Then the number of genomes used and number of reads created were adjusted to evaluate the variant callers under a range of sample conditions.



Results





Important Directories

projects/SNP_Call_Benchmarking/Benchmarking_Run:

  • Directory containing synthetic reads, variant caller output, and benchmarks
  • All output of Benchmarking Workflow goes here

SNPCall_Benchmarking/Workflow:

Production-stage scripts that form the core Benchmarking workflow.

Genesis.sh

  • Script performing directory setup, SNP generation, read synthesis, and alignment

MultiCaller.sh

  • Script running variant callers and benchmarking on data generated by Genesis.sh

Single_scripts/

  • Core workflow broken up into individual scripts (read generation,alignment,individual variant callers, etc)

SNP_Injector_Fasta.py

  • Python code in Genesis.sh used to "inject" SNPs into fasta files
  • Creates a log of input SNPs and genome locations