Skip to content

MerrimanLab/variant_calling_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A snakemake workflow for calling variants using GATK on NGS data

sequences_metadata.tsv needs to be a tsv file and have within it the following columns named: unique_id, Sample, fq1, fq2, rg

  • unique_id: character string from set [\w]
  • Sample: character string from set [\w]
  • fq1/fq2: file path of the fastq file
  • rg: character string of the read group information eg @RG\tID:flowcell_lane\tSM:sample\tPL:illumina\tLB:library\tPU:flowcell_lane

A good method for creating unique_id is to use "{flowcell}_{lane}"

see this article from the GATK about read groups for more information

Setting up the pipeline

# On the Biochem servers:

# Load conda
$ module load miniconda/Miniconda3_4.8.3
# Create the environment
$ conda env create -f environment.yaml -p env

# create links to the needed reference files (GATK resource bundles)

Running the pipeline:

Make sure that conda is installed and available and activate the environment

# On biochem servers:

# Load conda
$ module load miniconda/Miniconda3_4.8.3

# activate the environment
$ conda activate ./env

Once the environment is activated make sure the sequences_metadata.tsv file is present and you can try a dry run using:

$ Snakemake -nr

If that succeeds then the pipeline can be started using

$ Snakemake

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages