SPEAQeasy- a Scalable Pipeline for Expression Analysis and Quantification that is easy to install and share

Summary

SPEAQeasy is a Scalable RNA-seq Pipeline for Expression Analysis and Quantification based on the RNAseq-pipeline. Built on nextflow, and capable of using Docker containers and utilizing common resource managers (e.g. SLURM), this port of the RNAseq-pipeline can be used in different computer environments. It is described in the manuscript here.

The main function of this pipeline is to produce comparable files to those used in recount2, a tool that provides gene, exon, exon-exon junction and base-pair level data.

This pipeline allows researchers to contribute data to the recount2 project even from outside the JHPCE.

Workflow overview

SPEAQeasy takes raw RNA-seq reads and produces analysis-ready R objects, providing a "bridge to the Bioconductor universe", where researchers can utilize the powerful existing set of tools to quickly perform desired analyses.

Beginning with a set of FASTQ files (optionally gzipped), SPEAQeasy ultimately produces RangedSummarizedExperiment objects to store gene, exon, and exon-exon junction counts for an experiment. Optionally, expressed regions data is generated, enabling easy computation of differentially expressed regions (DERs).

Our vignette demonstrates how genotype calls by SPEAQeasy can be coupled with user-provided genotype and phenotype data to easily resolve identity issues that arise during sequencing. We then walk through an example differential expression analysis and explore data visualization options.

Pipeline features

Automatically merge samples split across multiple FASTQ files, using the samples.manifest input
Trivially select any GENCODE annotation release for "hg38", "hg19", or "mm10" references (Ensembl for "rat" reference) and adjust other annotation settings with simple configuration
Generates a single VCF file for experiments on human reference, which can be used to resolve sample identity issues and salvage problematic samples
Supports docker to manage software dependencies and is preconfigured for execution locally or on SLURM or SGE clusters
Multiple users can share a single SPEAQeasy installation with minimal work
Detailed, user-friendly logging for transparency and identifying potential issues

Getting started

The SPEAQeasy documentation website describes the pipeline in full detail. For briefly getting started, check out the quick start guide.

Because SPEAQeasy is based on the nextflow workflow manager, it supports execution on computing clusters managed by SLURM or SGE without any configuration (local execution is also possible). Those with access to docker can very simply use docker containers to manage SPEAQeasy software dependencies, though we provide a script for installing dependencies for users without docker or even root privileges.

Authors

Original Pipeline

Emily Burke, Leonardo Collado-Tores, Andrew Jaffe, BaDoi Phan

Nextflow Port

Nick Eagles, Brianna Barry, Jacob Leonard, Israel Aguilar, Violeta Larios, Everardo Gutierrez

Cite `SPEAQeasy`

We hope that SPEAQeasy will be useful for your research. Please use the following bibtex information to cite the software and overall approach. Thank you!

@article {Eagles2021,
	author = {Eagles, Nicholas J. and Burke, Emily E. and Leonard, Jacob and Barry, Brianna K. and Stolz, Joshua M. and Huuki, Louise and Phan, BaDoi N. and Larrios Serrato, Violeta and Guti{\'e}rrez-Mill{\'a}n, Everardo and Aguilar-Ordo{\~n}ez, Israel and Jaffe, Andrew E. and Collado-Torres, Leonardo},
	title = {SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses},
	year = {2021},
	doi = {10.1186/s12859-021-04142-3},
	publisher = {Springer Science and Business Media LLC},
	URL = {https://doi.org/10.1186/s12859-021-04142-3},
	journal = {BMC Bioinformatics}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,016 Commits
.github		.github
Annotation		Annotation
assets		assets
conf		conf
dockerfiles		dockerfiles
documentation		documentation
execution_DAGs		execution_DAGs
execution_reports		execution_reports
modules		modules
notes		notes
scripts		scripts
test		test
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEAQeasy.Rproj		SPEAQeasy.Rproj
install_software.sh		install_software.sh
main.nf		main.nf
nextflow.config		nextflow.config
run_pipeline_jhpce.sh		run_pipeline_jhpce.sh
run_pipeline_local.sh		run_pipeline_local.sh
run_pipeline_sge.sh		run_pipeline_sge.sh
run_pipeline_slurm.sh		run_pipeline_slurm.sh
software_config.lua		software_config.lua

License

LieberInstitute/SPEAQeasy

Folders and files

Latest commit

History

Repository files navigation

SPEAQeasy- a Scalable Pipeline for Expression Analysis and Quantification that is easy to install and share

Summary

Workflow overview

Pipeline features

Getting started

Authors

Cite SPEAQeasy

Contact

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

Cite `SPEAQeasy`