Skip to content

16s rRNA Sequencing Meta-analysis Reproducibility Tool (using mothur).

License

Notifications You must be signed in to change notification settings

achillesrasquinha/16SMaRT

Repository files navigation

16SMaRT

16s rRNA Sequencing Meta-analysis Reconstruction Tool.

16SMaRT is a bioinformatics analysis pipeline for 16s rRNA gene sequencing data. 16SMaRT is a "one-click" solution towards performing microbial community analysis of amplicon sequencing data. 16SMaRT aims to be your go-to solution for your next microbiome/metagenomics project. The primary objective of 16SMaRT analysis is to determine what genes are present and in what proportions in comparison across a range of samples. It currently supports single-end or paired-end Illumina MiSeq data.

16SMaRT is written in Python using boilpy's data-pipeline boilerplate. 16SMaRT is built on top of a considerable amount of dependencies and hence, the recommended way to install it is by using docker thus making installation in "one-click" and perfectly reproducible results. 16SMaRT is built with considering maximizing computation resources in mind thereby making it blazingly fast even on a local machine for a decent amount of samples. For a large number of studies, it is recommended to run 16SMaRT on a High-Performance Computing system using singularity.

Table of Contents

Features

Quick Start

Using Docker

First, install docker onto your system (can be followed via docker's documentation).

Then, you can run simply run 16SMaRT by the following command:

docker run \
    --rm -it \
    -v "<HOST_MACHINE_PATH_DATA>:/data" \
    -v "<HOST_MACHINE_PATH_CONFIG>:/root/.config/s3mart \
    -v "<HOST_MACHINE_PATH_WORKSPACE>:/work \
    ghcr.io/achillesrasquinha/s3mart \
    bpyutils --run-ml s3mart -p "data_dir=/data" --verbose

where <HOST_MACHINE_PATH_DATA> is the path to your host machine to store pipeline data and <HOST_MACHINE_PATH_CONFIG> is the path to store 16SMaRT configuration and intermediate data. <HOST_MACHINE_PATH_WORKSPACE> is a workspace directory for you to store your files that can be used by 16SMaRT (e.g. input files).

Running on HPC systems using Singularity

Singularity is the most widely used container system for HPC (High-Performance Computing) systems. In order to run your analysis on an HPC system, simply run the following command.

singularity run \
    --home $HOME \
    --cleanenv \
    -B <HOST_MACHINE_PATH_DATA>:/data \
    -B <HOST_MACHINE_PATH_CONFIG>:/root/.config/s3mart \
    -B <HOST_MACHINE_PATH_WORKSPACE>:/work \
    oras://ghcr.io/achillesrasquinha/s3mart:singularity \
    bpyutils --run-ml s3mart -p "data_dir=/data" --verbose

Usage

Basic Usage

  • input

    Path to input CSV file, data directory of FASTQ files, URL to CSV file.

  • fastqc

    Run FASTQC after downloading SRAs. (boolean, default - True)

  • multiqc

    Run MultiQC after performing FASTQC. (boolean, default - True)

Check out the docs page to understand how to use this pipeline.

Support

Have any queries? Post an issue on the GitHub Issue Tracker.

Citation

If you use this software in your work, please cite it using the following:

Furbeck, R., & Rasquinha, A. (2021). 16SMaRT - 16s rRNA Sequencing Meta-analysis Reconstruction Tool. (Version 0.1.0) [Computer software]. https://github.com/achillesrasquinha/16SMaRT

A comprehensive list of references for the tools used is listed here.

License

This repository has been released under the MIT License.


Made with ❤️ using boilpy.