dDocent

This script serves as an interactive bash wrapper to QC, assemble, map, and call SNPs from double digest RAD data. It is designed to run on Linux based machines with large memory capacity and multiple processing cores

There are now two different versions of dDocent: dDocent.FB and dDocent.GATK. dDocent.FB uses minimal BAM file preparation steps before calling SNPs and INDELS simultaneously using FreeBayes (Garrison & Marth 2012). dDocent.GATK uses GATK (McKenna et al. 2010) for INDEL realignment, SNP and INDEL genotyping (using HaplotypeCaller), and variant quality score recalibration, largely following GATK Best Practices recommendations (DePristo et al. 2011; Auwera & Carneiro 2013). The modules represent two different strategies for SNP/INDEL calling, and are completely independent of one another.

For now, I will be focusing on dDocent.FB because it is substantially faster and has less dependecies. See http://bcbio.wordpress.com/2013/10/21/updated-comparison-of-variant-detection-methods-ensemble-freebayes-and-minimal-bam-preparation-pipelines/ for a great comparison of FreeBayes and GATK.

#Requirements

Instead of reinventing the wheel, dDocent relies almost entirely on third party software to complete every step of the analysis pipeline, and users are encouraged to familiarize themselves with several of these programs, especially Rainbow, BWA, FreeBayes, GATK, and VCFtools. Below is a list of all the dependencies of dDocent and websites to reference the software:

Software	Link
FreeBayes	https://github.com/ekg/freebayes
GATK*	http://www.broadinstitute.org
STACKS	http://creskolab.uoregon.edu/stacks
cutadapt	http://code.google.com/p/cutadapt/
FastQC	http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
TrimGalore!	http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
Mawk	http://invisible-island.net/mawk/
BWA	http://bio-bwa.sourceforge.net
SAMtools	http://samtools.sourceforge.net
Picard*	http://picard.sourceforge.net
VCFtools	http://vcftools.sourceforge.net/index.html
rainbow	http://sourceforge.net/projects/bio-rainbow/files/
seqtk	https://github.com/lh3/seqtk
CD-HIT	http://weizhong-lab.ucsd.edu/cd-hit/
Seq_filter.pl	https://code.google.com/p/seq-filter/downloads/list
cutseq_fasta.pl	http://code.google.com/p/nash-bioinformatics-codelets/
bedtools	https://code.google.com/p/bedtools/
vcflib	https://github.com/ekg/vcflib
gnuplot	http://www.gnuplot.info

Programs with * are only required for dDocent.GATK

Also, FreeBayes requires cmake for compiling. Make sure it is installed on your system. http://www.cmake.org/cmake/resources/software.html

#Installation

dDocent is designed to run on a multicore, high memory capacity linux based computer. As stated above, dDocent depends on several other software packages and assumes that they will be installed in your $PATH directory and that all dDocent dependencies are in a single directory. The easiest way to do this, for all users of your machine is to install everything into the /usr/local/bin directory. You will need administrator or "root" privileges to do this.

If you don't have access to the /usr/local/bin directory, don't worry. dDocent can be installed locally in your user account. To do this, follow these simple commands:

cd ~

mkdir dDocent

nano .bash_profile

If this file is blank, type:

PATH="~/dDocent:${PATH}"
export PATH

Otherwise, simply add ~/dDocent to the end of the existing string.

Now if you are using a Mac computer, things get a little trickier. You need to make sure you have Xcode installed, as well as the command line tools. After this is complete, download the gcc complier from (http://hpc.sourceforge.net) and install it according to the website's instructions. You will also have to install git from (http://git-scm.com/download).

If you want more information on setting your $PATH and this setup process, check out the Palumbi Lab’s Simple Fool’s Guide for a good explanation and tutorial on what $PATH is and how to set it (http://sfg.stanford.edu/computer.html).

Once $PATH is setup, there is a VERY simplistic installation script located in the GitHub Repository called install_dDocent_requirements. To run it, simply type:

sh install_dDocent.FB_requirements <your path directory>

The script will check to see if any of the required packages are installed and if they aren’t download and install them. If you are installing computer wide, you probably will need to run the script as sudo.

If all went well, typing “dDocent.FB” and hitting return should start the pipeline.

dDocent requires that your raw data are split up by tagged individual and follow the naming convenction of:

Pop1_Sample1.F.fq Pop1_Sample1.R.fq

dDocent uses raw reads for reference assembly and trimmed reads for read mapping and SNP/variant calling. If the user is not using dDocent for trimming, trimmed reads must already be in the directory and must follow the naming convention below:

Pop1_001.R1.fq  Pop1_001.R2.fq

Pop1_002.R1.fq  Pop1_002.R2.fq

Where R1 are trimmed forward reads and R2 are trimmed paired-end reads.

These files must all be in the same directory.

#Running If dDocent is installed to your $PATH, change to the data directory and type:

dDocent.FB

Otherwise it can be run like any other BASH script:

sh /PATH_TO_dDOCENT/dDocent.FB

#User Guide

For a detailed user guide please see: http://ddocent.wordpress.com

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
dDocent.Coconut		dDocent.Coconut
dDocent.FB		dDocent.FB
dDocent.GATK		dDocent.GATK
install_dDocent.FB_requirements		install_dDocent.FB_requirements
install_dDocent.GATK_requirements		install_dDocent.GATK_requirements
mergefq.pl		mergefq.pl
rename.for.dDocent		rename.for.dDocent
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

dDocent.Coconut

dDocent.Coconut

dDocent.FB

dDocent.FB

dDocent.GATK

dDocent.GATK

install_dDocent.FB_requirements

install_dDocent.FB_requirements

install_dDocent.GATK_requirements

install_dDocent.GATK_requirements

mergefq.pl

mergefq.pl

rename.for.dDocent

rename.for.dDocent

run.sh

run.sh

Repository files navigation

dDocent

About

Releases

Packages

Languages

License

mckaydavis/dDocent

Folders and files

Latest commit

History

Repository files navigation

dDocent

About

Resources

License

Stars

Watchers

Forks

Languages