Skip to content

ISSRseq_AnalyzeBAMs

Brandon Sinn edited this page Jul 8, 2021 · 8 revisions

Overview

ISSRseq_AnalyzeBAMs.sh calls variants for each sample using the reference confidence model of GATK4's Haplotypecaller, with linked de bruijn graph mode enabled. HaplotypeCaller identifies regions in our BAM files that likely contain haplotypes, and realigns reads in these regions prior to calculating genotype likelihoods and emitting calls of indel and SNP variants. Haplotypecaller to generate one GVCF file per sample, and all GVCF files are then merged into a single which is then used by GenotypeGVCFs to conduct joint genotyping on all input samples. The VCF file of jointly scored variants is then input into SelectVariants, which hard filters variants following the GAKT best practices hard filter recommendations.

Filtered variants to be analyzed using ISSRseq_CreateMatrices, or user-specific downstream applications, are output to a file named filtered_variants.vcf in the variants directory.

Usage

DO NOT include a slash at the end of any file path.

-O [desired prefix of output directory]

-T [number of parallel processing threads -- I recommend not exceeding number of virtualized cores]

-P [ploidy of organism]

Output Files and Directories

OUTPUT_DIR

gvcfs -- contains GVCF outputs
variants -- contains VCF outputs

haplotypecallerBAMs -- contains haplotypecaller locally-realigned BAMs used for genotyping