Skip to content

JetBrains-Research/washu

Repository files navigation

JetBrains Research License license Tests tests ChIPSeq Pipeline Tests tests

Disclaimer

These pipelines use qsub and pure bash cpu level parallelism.
Please have a look at the updated snakemake pipeline chipseq-smk-pipeline.

Pipelines

Scalable and reproducible technical pipelines for ChIP-Seq and RNA-Seq processing.
Parallel execution is supported with zero configuration on Portable Batch System (qsub) and local machines.
Reproducibility is guaranteed by automated testing of all the steps in Docker using Continuous Integration.

ChIP-Seq pipeline was used for Epigenetic changes in aging human monocytes ChIP-Seq data analysis.

  • pipeline_chipseq.py - Pipeline for batch ChIP-Seq processing, including QC, alignment, peak calling
  • pipeline_tf.py - Pipeline for batch Transcription Factor ChIP-Seq processing
  • pipeline_rnaseq.py - Pipeline for batch RNA-Seq processing, including QC, alignment, quantification

How do I launch the ChIP-Seq pipeline?

Follow these instructions to launch ChIP-Seq pipeline:

  • Configure environment, see Requirements section
  • Place all the .fastq files to a single <FASTQ_FOLDER>
  • Create <INDEXES> folder to store all the indexes required
  • Launch the pipeline with desired <genome>, e.g. mm9 or hg19
python3 pipeline_chipseq.py <FASTQ_FOLDER> <INDEXES> <genome>

How do I launch the RNA-Seq pipeline?

Follow these instructions to launch RNA-Seq pipeline:

  • Configure environment, see Requirements section
  • Place all the .fastq files to a single <FASTQ_FOLDER>
  • Create <INDEXES> folder to store all the indexes required
  • Launch the pipeline with desired <genome>, e.g. mm9 or hg19
python3 pipeline_rnaseq.py <FASTQ_FOLDER> <INDEXES> <genome>

Requirements

  • Ensure you have Python 3 installed as default interpreter
  • Add the following to ~/.bashrc (Linux) or ~/.bash_profile (MacOS):
# Configure project path
export WASHU_ROOT="<PATH_TO_REPOSITORY>"

# Configure correct python code execution
export PYTHONPATH="$WASHU_ROOT:$PYTHONPATH"

# Configure local machine parallelism
export WASHU_PARALLELISM=8
  • Install required tools using Conda
conda install --channel bioconda samtools bedtools bowtie bowtie2 fastqc multiqc sra-tools macs2 sicer \
    ucsc-bedgraphtobigwig ucsc-bedclip ucsc-bigwigaverageoverbed \
    star rseg 

For more details see docker/biolabs/washu/Dockerfile.

curl --location https://github.com/broadinstitute/picard/releases/download/2.10.7/picard.jar \
    --output ~/picard.jar
curl --location https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/phantompeakqualtools/ccQualityControl.v.1.1.tar.gz \
    --output ~/phantompeakqualtools.tar.gz 
tar xvf ~/phantompeakqualtools.tar.gz
curl --location https://download.jetbrains.com/biolabs/span/span-1.1.5628.jar \
    --output ~/span.jar 

Project structure

  • /bed - BED files manipulations - intersection, ChromHMM enrichment, closes gene, etc.
  • /docker - Docker configuration files with tools and test data. See Tests section.
  • /parallel - Scripts for parallel execution of Portable Batch System (qsub) or on local machine.
    Parallelism level on local machine can be configured via WASHU_PARALLELISM environment variable.
  • /scripts - QC, Visualization, BAM conversions, Reads In Peaks, etc.
  • /test - Tests for pipelines.

Tests

Explore preconfigured Continuous Integration configurations on TeamCity:

Fetch Docker image biolabs/washu with all the necessary tools for pipeline and test data.

docker pull biolabs/washu

Launch tests.

# Change working directory
cd <project_path>

# General tests
docker run -v $(pwd):/washu -t -i biolabs/washu /bin/bash -c \
    "source activate py3.5 && cd /washu && bash test.sh"

# ChIP-Seq Pipeline tests
docker run -v $(pwd):/washu -t -m 2G -e JAVA_OPTIONS="-Xmx1G" -i biolabs/washu /bin/bash -c \
    "source activate py3.5 && cd /washu && bash test_pipeline_chipseq.sh"

Explore the results of ChIP-Seq pipeline in out folder after executing these tests.

Tools used

Bedtools, Bowtie, Bowtie2, FastQC, MACS2, MANorm, MultiQC, Phantompeakqualtools, Picardtools, RSeg, Samtools, SICER, SPAN

STAR, RSEM

Data standards and pipelines

Useful links

  • JetBrains Research BioLabs homepage
  • Washington University in Saint Louis Maxim Artyomov LAB homepage
  • Review on ChIP-Seq, ATAC-Seq and DNAse-Seq processing in latex format