Whole Genome Sequencing Pipeline using GATK Best Practice

Introduction

In this repo, some easy-start up pipeline scripts are given to execute germline Whole Genome Sequencing analysis according to GATK Best Practice Workflow. Here we use an accelerated tool Sentieon that imporved the efficiency to replace java -version GATK, and bcftools to help vcf normalization. Furthermore, we use ANNOVAR as an annotation tool to annotate variants with global database.

Scripts

Run_vc.pl : Run a large cohort single sample variant calling (Variant_calling.sh) with one command.
Variant_calling.sh : Read Fastq raw data and output vcf format file.
VQSR.sh : Running Variant Quality Score Recalibration of a vcf file.
Normailzation.sh : Decompose and normalizeation of a vcf file.
Joint_calling.sh : Running Joint calling using gvcf of a large cohort.
Annovar.sh : Running variant annotation in single vcf file.

Usage

To run on PBS system server:

#running single shellscript using queue
qsub -N <jobname> -o <qsub_logfile_name> job.sh
# single sample WGS to run large cohort
perl run_job.pl -i <id_list> -s <start_line> -e <end_line>
## -s and -e is the line number of -i file starts with 1

To run on normal linux environment:

screen -S <your_screen_name>  # create a screen to run job in background
cd <your_pipeline_dir>
./<yourjob.sh> # add sample name when running Variant_calling.sh./Variant_calling.sh <SampleName>
# press ctrl+"a"+"d" to detatch screen
# `screen -ls` to list all screen
# `screen -r <your_screen_name>` to return to the screen

Common script explanation

PBS header

The header is used for PBS system, if you run the script in normal linux system, you can ignore them~~

#PBS -q <QueueName>		### queuename
#PBS -P <groupID>		### group name on your nchc website
#PBS -W group_list=<groupID>	### same as above
#PBS -l select=1:ncpus=40	### cpu thread count (qstat -Qf <queue> and find `resources_default.ncpus` to fill)
#PBS -l walltime=8:00:00	### clock time limit after job started
#PBS -M <email>	### email setting to follow job status
#PBS -m be
#PBS -j oe

Logfile

PBS will output the log "after" all job exit as default, so you need to redirect your logfile to "runtime" runlog.
Same as normal linux system. You can check your log whenever you want without scrolling your screen in backgroud Screen lol.

#pipe your log output to run.log
logfile="${workdir}/<logfile name>"
set -x
exec 3<&1 4<&2
exec >$logfile 2>&1

######################
### your code here ###
######################

#pipe back to queue log
set +x
exec >&3 2>&4
exec 3<&- 4<&-

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
.gitkeep		.gitkeep
Normalization.sh		Normalization.sh
README.md		README.md
Run_vc.pl		Run_vc.pl
VQSR.sh		VQSR.sh
annovar.sh		annovar.sh
joint_calling.sh		joint_calling.sh
variant_calling.sh		variant_calling.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

.gitkeep

.gitkeep

Normalization.sh

Normalization.sh

README.md

README.md

Run_vc.pl

Run_vc.pl

VQSR.sh

VQSR.sh

annovar.sh

annovar.sh

joint_calling.sh

joint_calling.sh

variant_calling.sh

variant_calling.sh

Repository files navigation

Whole Genome Sequencing Pipeline using GATK Best Practice

Introduction

Scripts

Usage

Common script explanation

PBS header

Logfile

About

Releases

Packages

Languages

B05611003/Whole-Genome-Sequencing-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Whole Genome Sequencing Pipeline using GATK Best Practice

Introduction

Scripts

Usage

Common script explanation

PBS header

Logfile

About

Topics

Resources

Stars

Watchers

Forks

Languages