Skip to content

AU-ENVS-Bioinformatics/TotalRNA-Snakemake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Snakemake workflow: TotalRNA-Snakemake

DOI Snakemake

A Snakemake workflow for TotalRNA analysis from the Department of Environmental Science of Aarhus University.

TLDR

conda activate snakemake
git clone https://github.com/AU-ENVS-Bioinformatics/TotalRNA-Snakemake
cd TotalRNA-Snakemake
snakemake -c1 skip_rename # or snakemake -n rename
snakemake -c100 --use-conda --keep-going

Introduction

Overview

This pipeline manages large-scale TotalRNA meta-transcriptomic data for taxonomic analyses of SSU reads and mRNA ANALYSIS. The steps involved are:

  1. Trim reads using trim-galore.
  2. Filtering SSU and LSU reads using sormerna and SILVA.
  3. Reconstructing ribosomal genes using Metarib.
  4. Checking the quality of the ribosomal assembly using QUAST.
  5. Mapping RNA contigs to reads using BWA and samtools.
  6. Classifying reads taxonomically using BLAST, SILVA and CREST.
  7. Assembling non-rRNA reads (Trinity) and filtering noncoding RNA using the RFam database.
  8. Mapping mRNA contigs to reads using BWA and samtools.
  9. Functional (best-hit) and taxonomic (LCA) annotation of mRNA contigs using Diamond and AnnoTree, which includes KEGG, Pfam and Tigrfam annotations for over 30,000 bacterial and 1600 archaeal genomes.

Check the Wiki of the project for more information.

Getting started

Requirements:

It is best to pre-install Mamba before starting. All other dependencies will be installed automatically when running the pipeline for the first time.

conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake

Usage

Activating conda environment:

conda activate snakemake

Clone this git repository to the location where you want to run your analysis.

git clone https://github.com/AU-ENVS-Bioinformatics/TotalRNA-Snakemake TotalRNA-Snakemake-Project
cd TotalRNA-Snakemake-Project

Copy or symlink raw fastq files into the ´reads´ directory. See reads/README.md for more information. Now, we are going to rename those files and made symlinks to the results/renamed directory. To skip this step, just copy your files into results/renamed and skip the next step. Alternatively, you can run snakemake -c1 skip_rename to symlink your files without renaming them.

snakemake -n rename
snakemake -c1 rename

Check that all your samples are in results/renamed:

ls results/renamed_raw_reads/

Check that the pipeline will behave as expected by running a dry run and check the configuration file if not.

snakemake -n --use-conda

Finally, run the whole pipeline. A useful flag to add is --keep-going to prevent the pipeline to stop if an error occurs. If you are running this in a shared environment, you can have all the conda environments in a shared location by adding --conda-prefix /path/to/shared/conda/envs.

snakemake -c100 --use-conda --keep-going

You should consider re-running the AnnoTree notebook with custom parameters interactively (notebook/annotree.ipynb)

Documentation

Please find more information in the Wiki of the project.