DESCRIPTION

These set of scripts is designed to align multiple samples of same species to a reference genome, do preprocessing, and then call variants. The scripts use the following software for given purposes in the given order.

trimgalore to trim adapters, clip the ends of the reads and generating fastqc reports
bwa mem for aligning
samtools sort -n for sorting by name
samtools fixmate for fixing mate information
samtools sort for sorting by coordinates
samtools markdup for marking duplicates
picard-tools AddOrReplaceReadGroups for addding and replacing RGtags
picard-tools CleanSam for setting Mapping Quality 0 for the sequences that are not aligned.
samtools index for indexing
samtools coverage for coverage reports
bamtools stats for alignment reports
bcftools for variant calling

USAGE

Download

git clone https://github.com/evolozzy/NGS-Pipeline.git

Before using

Make a subdirectory named Data in the folder containing your scripts and copy your files there, or change the line containing DATASOURCE in your PARAMETERS file, and set it to the folder that contains your data.
If you have two or more sets of reads to merge keep them in separate directories in Data directory.
Make sure you have your reference file.
Edit RGTAGS file carefully, the files belonging the same sample should have the same SM (sample name).

Using

Setting the parameters

Carefully change the PARAMETERS.
- Set the REFERENCEFILE to the path to reference.
- If you are running on multiple threads set THREADS to number of cores you want to use.
Set the directories to be used in DIRECTORIES file.
- If you're not running the scripts in the directory you have the scripts change the line containing WD to the path that contains your scripts.
Install required software, and set PROGRAMPATHS.

Running

Inside the folder:

./runall.sh

Or outside the folder:

/path/to/scripts/runall.sh

If you encounter any errors during the process and clean all the files created by the script:

./resetanalysis.sh

Best Practices

Before running runall.sh, use trimall.sh to quality control the trimming process. Checkout the fastqc reports after trimming and set PARAMETERS accordingly.
Make sure that the core numbers are set properly. Try to use parallel more, but it depends on the number of files. For low numbers of files

How does this set of scripts work?

The script checks
- if the files are in place
- if the software is installed
- calculates a good way to use the cores available
- builds references from reference file
Trimming is done with trimgalore.
Aligning is done with bwa
Preprocessing is done with samtools and picard-tools.
1. First, the files are sorted by name and mate info is fixed.
2. Second, the files are sorted by coordinate and duplicates are marked.
3. Third, the files are cleaned from reads that were not aligned.
4. Last, RG tags are added.
Variants are called with bcftools.

The middle files can be kept, deleted, or archived to another location.
The code also generates reports of trimming (fastqc reports), alignment, and coverage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

CORES

CORES

DIRECTORIES

DIRECTORIES

PARAMETERS

PARAMETERS

PROGRAMPATHS

PROGRAMPATHS

README.md

README.md

RGTAGS

RGTAGS

resetanalysis.sh

resetanalysis.sh

runall.sh

runall.sh

trimall.sh

trimall.sh

Repository files navigation

DESCRIPTION

USAGE

Download

Before using

Using

Setting the parameters

Running

Best Practices

How does this set of scripts work?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
scripts		scripts
CORES		CORES
DIRECTORIES		DIRECTORIES
PARAMETERS		PARAMETERS
PROGRAMPATHS		PROGRAMPATHS
README.md		README.md
RGTAGS		RGTAGS
resetanalysis.sh		resetanalysis.sh
runall.sh		runall.sh
trimall.sh		trimall.sh

evolozzy/NGS-Pipeline

Folders and files

Latest commit

History

Repository files navigation

DESCRIPTION

USAGE

Download

Before using

Using

Setting the parameters

Running

Best Practices

How does this set of scripts work?

About

Resources

Stars

Watchers

Forks

Languages