chipseq-smk-pipeline

Snakemake based pipeline for ChIP-seq and ATAC-seq datasets processing from raw data QC and alignment to visualization and peak calling.

During peak calling steps chipseq-smk-pipeline automatically matches signal with control file by names proximity.

Input

Input FASTQ files

Pipeline aligned FASTQ or gzipped FASTQ reads, defined in config.yaml.
Reads folder is a relative path in pipeline working directory and defined by fastq_dir property.
FASTQ reads extension is defined by fastq_ext property, e.g. could be fq, fq.gz, fastq, fastq.gz.

Input BAM files

Use start_with_bams=True config option to start with existing bam files.
Pipeline starts with BAM files in work_dir/bams folder.

Files

Path	Description
`config.yaml`	Default pipeline options
`trimmed`	Trimmed FASTQ file, if `trim_reads` option is True.
`bams`	BAMs with aligned reads, `MAPQ >= 30`
`bw`	BAM coverage visualization using DeepTools
`macs2`	MACS2 peaks
`sicer`	SICER peaks
`span`	SPAN peaks
`qc`	QC Reports
`multiqc`	MultiQC reports for different steps
`logs`	Shell commands logs

Requirements

The pipeline requires conda.

If conda is not installed, follow the instructions at Conda website.
Navigate to repository directory.

Create a Conda environment for snakemake:

$ conda env create --file environment.yaml --name snakemake

Activate the newly created environment:

$ source activate snakemake

On Ubuntu please ensure that gawk is installed:

$ sudo apt-get install gawk

Launch

Run the pipeline to start with fastq reads:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> --rerun-incomplete

The Default pipeline doesn't perform coverage visualization and launch peak callers. Please add bw=True, macs2=True, sicer=True, span=True to create coverage bw files and call peaks.

To launch MACS2 in --broad mode, use the following config:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all [--cores <cores>] --use-conda --directory <work_dir> \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
    --rerun-incomplete

See config.yaml for a complete list of parameters. Use--config to override default options from config.yaml file.

QSUB

Configure profile for qsub with Torque scheduler with name generic_qsub

$ mkdir -p ~/.config/snakemake
$ cd ~/.config/snakemake
$ cookiecutter https://github.com/iromeo/generic.git

Example of ATAC-Seq processing on qsub

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --directory <work_dir> \
    --profile generic_qsub --cluster-config qsub_config.yaml --jobs 150 \
    --config fastq_dir=<fastq_dir> genome=<genome> \
    bowtie2_params="-X 2000 --dovetail" \
    macs2=True macs2_params="-q 0.05 -f BAMPE --nomodel --nolambda -B --call-summits" \
    span=True span_fragment=0 span_bg_sensitivity=1.0 span_clip=0.4 --rerun-incomplete

P.S: Use --config to override default options from config.yaml file

Try with test data

Please download example fastq.gz files from CD14_chr15_fastq folder.
These files are filtered on human hg19 chr15 to reduce size and make computations faster.

Launch chipseq-smk-pipeline:

$ snakemake -p -s <chipseq-smk-pipeline>/Snakefile \
    all --use-conda --cores all --directory <work_dir> \
    --config fastq_ext=fastq.gz fastq_dir=<work_dir> genome=hg19 macs2=True sicer=True span=True \
    --rerun-incomplete

Useful links

Learn more about Snakemake workflow management system
Developed with SnakeCharm plugin for PyCharm IDE by JetBrains Research BioLabs

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
envs		envs
rules		rules
scripts		scripts
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
environment.yaml		environment.yaml
pipeline.png		pipeline.png
pipeline_util.py		pipeline_util.py
qsub_config.yaml		qsub_config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

envs

envs

rules

rules

scripts

scripts

README.md

README.md

Snakefile

Snakefile

config.yaml

config.yaml

environment.yaml

environment.yaml

pipeline.png

pipeline.png

pipeline_util.py

pipeline_util.py

qsub_config.yaml

qsub_config.yaml

Repository files navigation

chipseq-smk-pipeline

Input

Files

Requirements

Launch

QSUB

Try with test data

Useful links

About

Releases

Packages

Contributors 3

Languages

JetBrains-Research/chipseq-smk-pipeline

Folders and files

Latest commit

History

Repository files navigation

chipseq-smk-pipeline

Input

Files

Requirements

Launch

QSUB

Try with test data

Useful links

About

Topics

Resources

Stars

Watchers

Forks

Languages