Skip to content

TheJacksonLaboratory/SVE

Repository files navigation

Structural Variation Engine (SVE)

Alt text
(c) 2017 Timothy Becker & Wan-Ping Lee

SVE is a python script based execution engine for Structural Variation (SV) detection and can be used for any levels of data inputs, raw FASTQs, aligned BAMs, or variant call format (VCFs), and generates a unified VCF as its output. By design, SVE consists of alignment, realignment and the ensemble of state-of-the-art SV-calling algorithms by default. They are BreakDancer, BreakSeq, cnMOPS, CNVnator, DELLY, Hydra and LUMPY. FusorSV is also embedded that is a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms.

Alt text

Requirements

  • python 2.7, HTSeq, numpy, scipy, subprocess32, bx-python, CrossMap and mygene
  • gcc 4.8 or greater
  • cmake 3.0 or greater
  • Root
  • R 3.3 or greater. You may type "make R-install" to install R-3.3.3.

Please set ROOT enviorment.

export ROOTSYS=/ROOT_Build_Path
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib

Installation

From Git

For SVE

git clone --recursive https://github.com/TheJacksonLaboratory/SVE.git
cd SVE
make

For FusorSV

Please check python2.7 header files and modify "CFLAGS_FUSOR_SV" in Makefile. The header files may be on "/usr/include/python2.7" and use "CFLAGS_FUSOR_SV=-I /usr/include/python2.7" instead.

make FusorSV

Or, you can install FusorSV by setup.py

cd SVE/scripts/FusorSV/
python setup.py build_ext --inplace
tar -zxvf data.tar.gz

From Docker

Alternatively, Dockerfile and Docker image are provided. Please notice that sudo may be required for docker usages depending on your machine setting.

cd SVE
docker build .

Pull docker image from the repository.

docker pull wanpinglee/sve

SVE is built on /tools/SVE. Check the help by

/tools/SVE/bin/sve

Usage

Align

Short reads in FASTQ will be mapped against the given FASTA and a sorted BAM will be generated.

bin/sve align [options] -r <FASTA> <FASTQ1 [FASTQ2]>

Realign

If the reads are given by BAM format, realign will remap reads against FASTA and generate a sorted BAM. We use SpeedSeq to accomplish realign.

bin/sve realign -r <FASTA> <BAM>

Call SVs

There are seven SV calling algorithms that can be used for SV calling. VCF will be generated.

bin/sve call -r <FASTA> -g <hg19|hg38|others> -a <breakdancer|breakseq|cnvnator|hydra|delly|lumpy|cnmops> <BAM [BAM ...]>

Merge VCFs

After calling, each sample may have mulitple VCFs depending on how many callers used. Please collect VCFs of a sample in a folder.

The vcfs should use SVE IDs to indicate the callers.

SVE ID Caller
4 BreakDancer (v1.4.5)
9 cn.MOPS (v1.20)
10 CNVnator (v0.3.3)
11 DELLY (v2)
14* GenomeSTRiP
17 Hydra
18 LUMPY
35 BreakSeq (v2.2)
0 Truth (optional)

Note*: Because of license issue, GenomeSTRiP is not embedded in SVE. However, FusorSV default model is able to handle GenomeSTRiP VCF.

Using default model (if S0 vcf is not provided)

Example input vcf files can be organized as follows. Please note that vcfFiles is the argument for -i for FusorSV.

  • vcfFiles/sample1/sample1_S11.vcf
  • vcfFiles/sample1/sample1_S10.vcf
  • vcfFiles/sample1/sample1_S4.vcf
  • vcfFiles/sample2/sample2_S11.vcf
  • vcfFiles/sample2/sample2_S10.vcf
  • vcfFiles/sample2/sample2_S4.vcf
python scripts/FusorSV/FusorSV.py -f scripts/FusorSV/data/models/default.pickle -L DEFAULT -r <FASTA> -i <vcfFiles>/ -p <THREADS> -o <OUT_DIR>

Using self-training model (if S0.vcf is provided)

According to S0.vcf, a new model will be generated and VCFs will be merged by the new model.

python scripts/FusorSV/FusorSV.py -L DEFAULT -r <FASTA> -i <vcfFiles>/ -p <THREADS> -o <OUT_DIR>

License

The project is licensed under the GPL-3.0 License. Please see LICENSE for details.