Skip to content

MaxGreil/sradcqc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sradcqc

Proof of concept of a SRA to trimmed FASTQ (including quality control) pipeline with Nextflow.

Prerequisites

  • Unix-like OS (Linux, macOS, etc.)
  • Java version 8
  • Docker engine 1.10.x (or later)
  • Working internet connection

Table of Contents

Quick start

You have two input methods:

nextflow run main.nf --sra_id SRR000001

You can give in the SRA ID to download the files from directly. If the entered SRA ID belongs to a run, all .sra files belonging to the same experiment will be downloaded, too.

nextflow run main.nf --input input/example.tsv

Alternatively, you can give in the SRA ID to download the files from via table input/example.tsv. Only these specified SRA IDs will be downloaded.

Optionally, you can specify the Nextflow output directory with flag --outdir <folder>. By default, all resulting files will be saved in folder output and folder info will contain all information about the last run nextflow session.

Potential SRA IDs for download can be inspected using the SRA Explorer.

Installation

Clone this repository with the following command:

git clone https://github.com/maxgreil/sradcqc && cd sradcqc

Then, install Nextflow by using the following command:

curl https://get.nextflow.io | bash

The above snippet creates the nextflow launcher in the current directory.

Finally pull the following Docker container:

docker pull maxgreil/sradcqc

Alternatively, you can build the Docker Image yourself using the following command:

cd docker && docker image build . -t maxgreil/sradcqc

Arguments

Required Arguments

Argument Usage Description
--sra_id <id> SRA ID to download the .fastq files from
--input <id> Table with SRA IDs to download the .fastq files from

Optional Arguments

Argument Usage Description
--outdir <folder> Directory to save .fastq.gz files

Documentation

This pipeline is designed to:

  • download all files given a SRA ID in .fastq file format
  • compress all .fastq files to .fastq.gz and finally
  • do a quality control of the compressed files

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  1. entrez-direct - retrieve identifiers from SRA database
  2. prefetch - fetch data from SRA database in .sra file format
  3. fasterq-dump - extract the .fastq file[s] from a sample SRA ID
  4. pigz - compress downloaded .fastq file[s] to .gz
  5. bbduk - trim adapters in .fastq file[s]
  6. FastQC - read quality control
  7. MultiQC - aggregate report, describing results of the whole pipeline

About

Proof of concept of a SRA to trimmed FASTQ (including quality control) pipeline with Nextflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published