Skip to content

RipollJ/awesome-bioinfo-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 

Repository files navigation

Awesome Bioinfo-tools

A curated list of awesome Bioinformatics tools.


Table of contents


Awesome existing topics related to bioinformatics

[top↑]


Suite tools

  • BBtools: BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data.
  • samtools: The original samtools package has been split into three separate but tightly coordinated projects:
    • htslib: C-library for handling high-throughput sequencing data
    • samtools: mpileup and other tools for handling SAM, BAM, CRAM
    • bcftools: calling and other tools for handling VCF, BCF
  • GATK: A genomic analysis toolkit focused on variant discovery.
  • EA-Utils: Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc. Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

[top↑]


Quality analysis & trimming tools

quality analysis checking

  • FastQC: A quality control tool for high throughput sequence data.
  • FastQ Screen: FastQ Screen is a simple application which allows you to search a large sequence dataset against a panel of different genomes to determine from where the sequences in your data originate.

trimming

  • Sickle: A windowed adaptive trimming tool for FASTQ files using quality.
  • Cutadapt: Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
  • bbduk: “Duk” stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool.
  • trimgalore: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries.
  • trimmomatic: A flexible read trimming tool for Illumina NGS data.
  • Sortmerna: Fast filtering, mapping and OTU picking.

read merger

  • PEAR: PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger.
  • Fastq-join: Joins two paired-end reads on the overlapping ends.
  • Seq-prep: SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read.
  • FLASH: Fast Length Adjustment of SHort reads

demultiplexing

  • fastq-multx: The goal of this program is to make it easier to demultiplex possibly paired-end sequences, and also to allow the "guessing" of barcode sets based on master lists of barcoding protocols (fluidigm, truseq, etc.)
  • UMI-tools: This repository contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
  • NextGenSeqUtils: Notebook for demultiplexing with custom barcoded primers.

[top↑]


Multiviewer

  • MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report.

[top↑]


Mapping tools

aligner

  • BWA: BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
  • Bowtie2: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
  • PANDASEQ: PANDASEQ is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
  • MPscan: MPscan: index free mapping of multiple short reads on a genome
  • DIAMOND: DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.

splice-aligner

  • Tophat2: TopHat is a fast splice junction mapper for RNA-Seq reads.
  • STAR: Spliced Transcripts Alignment to a Reference.
  • CRAC: RNA-Seq mapping software that include the discovery of transcriptomic and genomic variants like splice junction, chimeric junction, SNVs, Indels in a single analysis step using a built-in error detection method enabling high precison and sensitivity.

[top↑]


Assembly tools

Genome & Transcriptome de novo assembly

  • Velvet: Sequence assembler for very short reads
  • SPAdes: SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
  • Minia: Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day.
  • Trinity: Trinity assembles transcript sequences from Illumina RNA-Seq data.

Metagenome & Metatranscriptome assembly

  • Megahit: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
  • MetaVelvetSL: An extension of Velvet assembler to de novo metagenomic assembly
  • MetaSPADES: Assemble metagenomic reads using the SPAdes assembler.
  • Minia for metagenome: GATB-Minia-Pipeline is a de novo assembly pipeline for Illumina data. It can assemble genomes and metagenomes.

Viewers

  • Bandage: Bandage is a program for visualising de novo assembly graphs.
  • IGV: visualization tool for interactive exploration of large, integrated genomic datasets.

Correction tools

  • rnaQUAST: rnaQUAST is a software designed for quality evaluation and assessment of de novo transcriptome assemblies.

[top↑]


Variant calling & alternative splicing tools

variant calling

  • VarScan: variant detection in massively parallel sequencing data.
  • KisSplice: A local transcriptome assembler for SNPs, indels and AS events
  • Farline: FaRLine is a pipeline to analyse the alternative splicing.
  • SplAdder: SplAdder, short for Splicing Adder, a toolbox for alternative splicing analysis based on RNA-Seq alignment data.
  • Whippet: Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop.
  • freebayes: freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Motif discovery

  • MaxENTScan: MaxEntScan is based on the approach for modeling the sequences of short sequence motifs such as those involved in RNA splicing which simultaneously accounts for non-adjacent as well as adjacent dependencies between positions.

Peak calling

  • MACS2: computational method used to identify areas in the genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing experiment.
  • m6a Viewer: m6a Viewer is a cross-platform java application for detecting and visualising peaks in ME-RIP/ m6a-seq data.

Learning tools

  • DeepVariant: DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
  • LaBrachoR: LaBranchoR uses a LSTM network built with keras to predict the position of RNA splicing branchpoints relative to a three prime splice site.
  • SpliceAI: A deep learning-based tool to identify splice variants
  • SpliceAI-wrapper: SpliceAI Wrapper, is an attempt to use caching for reducing the number of required predictions. Please note that the authors of SpliceAI Wrapper are unrelated to the authors of SpliceAI.

Correction tools

  • Portcullis: Portcullis stands for PORTable CULLing of Invalid Splice junctions from pre-aligned RNA-seq data.

[top↑]


Counting tools

  • FeatureCounts: counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
  • Kallisto: kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
  • HTSeqCount: Analysing high-throughput sequencing data with Python
  • StringTie: Transcript assembly and quantification for RNA-Seq
  • RSEM: RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.

[top↑]


Statistical analysis tools

RNA-seq

  • DESeq2: Differential gene expression analysis based on the negative binomial distribution.
  • EdgeR: Empirical Analysis of Digital Gene Expression Data in R.
  • NBAMSeq: NBAMSeq is a Bioconductor package for differential expression analysis based on negative binomial additive model.
  • NOISeq: Exploratory analysis and differential expression for RNA-seq data
  • Sleuth: sleuth is a program for analysis of RNA-Seq experiments for which transcript abundances have been quantified with kallisto.

Metagenomics

  • Metagenassist: A comprehensive web server for comparative metagenomics
  • MG-RAST: A Metagenomics Service for Analysis of Microbial Community Structure and Function.
  • MEGAN: Metagenome Analyzer - MEGAN6 is a comprehensive toolbox for interactively analyzing microbiome data.

Metabarcoding | Community Ecology

  • vegan: Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Alternative-splicing

  • DEX-seq: Inference of differential exon usage in RNA-Seq.
  • KissDE: Retrieves Condition-Specific Variants in RNA-Seq Data.

RIBO-seq

  • Xtail: Genome-wide assessment of differential translations with ribosome profiling data.
  • Anota2Seq: Generally applicable transcriptome-wide analysis of translational efficiency using anota2seq.

[top↑]


Phylogenomics

Aligner

  • RAPPAS: RAPPAS: Rapid alignment-free phylogenetic identification of metagenomic sequences.
  • Clustalw: Multiple Sequence Alignment.
  • MEGA: Molecular Evolutionary Genetics Analysis.
  • MAFFT: Multiple alignment program for amino acid or nucleotide sequences.
  • MUSCLE: MUSCLE stands for MUltiple Sequence Comparison by Log- Expectation.

Phylogenetic inference

  • PhyML: PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
  • RAxML: RAxML - Randomized Axelerated Maximum Likelihood.
  • FastTree: FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
  • FastME: FastME provides distance algorithms to infer phylogenies.
  • MrBayes: MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.

Model test

  • jModelTest2: jModelTest is a tool to carry out statistical selection of best-fit models of nucleotide substitution.
  • ModelTest-NG: ModelTest-NG is a tool for selecting the best-fit model of evolution for DNA and protein alignments.
  • SMS: Smart Model Selection using likelihood-based criteria (e.g., AIC).

Visualization

  • Aquapony: Visualization and interpretation of phylogeographic information on phylogenetic trees
  • iTOL: Interactive Tree Of Life is an online tool for the display, annotation and management of phylogenetic trees.
  • ETE: A Python framework for the analysis and visualization of trees.
  • Krona: Krona allows hierarchical data to be explored with zooming, multi-layered pie charts.

Tree comparison

  • CompPhy: A web-based collaborative platform for comparing phylogenies
  • Phylo.io: A web app and library for visualising and comparing phylogenetic trees.

Platform

  • CIPRES: The CIPRES Science Gateway V. 3.3 is a public resource for inference of large phylogenetic trees.

[top↑]


Others

Exploration tools

  • RNA-Ribo Explorer (RRE): RRE is an interactive, stand-alone, and graphical software for analysing, viewing and mining both transcriptome (typically RNA-seq) and translatome (typically Ribosome profiling or Ribo-seq) datasets.
  • IGET: The Integrated Genomics Exploration Tools (IGET) website provides online access to a suite of tools for exploring biological pathways and DNA/RNA/protein regulatory elements associated with large-scale gene expression and protein behavior dynamics.

Network & Interaction visualisation

  • Gephi: visualization and exploration software for all kinds of graphs and networks.
  • Cytoscape: visualization of complex networks and integrating these with any type of attribute data.
  • String: Protein-Protein Interaction Networks Functional Enrichment Analysis

Clustering & homology

  • CD-HIT: CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
  • HMMER: HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments.
  • STRUCTURE: The program structure is a free software package for using multi-locus genotype data to investigate population structure.

Annotations tools

  • Trinotate: Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms.
  • gProfiler: g:Profiler is a public web server for characterising and manipulating gene lists.
  • TransDecoder: TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

Ontology & Pathway databases

  • Gene Ontology: The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes.
  • KEGG: KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
  • DAVID: Database for Annotation, Visualization and Integrated Discovery (DAVID).
  • PANTHER: Protein ANalysis THrough Evolutionary Relationships.
  • RNAcentral: The non-coding RNA sequence database

Metabarcoding databases

  • Silva: SILVA provides comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).
  • ITS2: Internal transcribed spacer 2 (ITS2) ribosomal RNA Database
  • FunGuild: Over 13,000 fungal taxa now included in the database & functional annotation tools.

[top↑]


Specific workflow

Alternative splicing

  • KisSplice: Training alternative splicing analysis with KisSplice & suite tools.

Community analysis

  • QIIME2: QIIME 2™ is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.
  • Mothur: This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
  • Vsearch: Open source tool for metagenomics.

[top↑]


Bioinformatic analysis informations

Metagenomic

Metatranscriptomic

Metabarcoding

Alternative-splicing

Ribo-seq

Merip-seq

mi-CLIP

Proteomics

MASS-SPEC

[top↑]

Releases

No releases published

Packages

No packages published