Skip to content

BBQuercus/eFISHent

Repository files navigation

GitHub code licence is MIT Anaconda package version number Github Actions Status CodeFactor codecov Maintainability DOI

Logo of eFISHent.

eFISHent

A command-line based tool to facilitate the creation of eFISHent single-molecule RNA fluorescence in-situ hybridization (RNA smFISH) oligonucleotide probes.

Description

eFISHent is a tool to facilitate the creation of eFISHent RNA smFISH oligonucleotide probes. Some of the key features of eFISHent are:

  • One-line installation using conda (available through bioconda*)
  • Automatic gene sequence download from NCBI when providing a gene and species name (or pass a FASTA file)
  • Filtering steps to remove low-quality probes including off-targets, frequently occuring short-mers, secondary structures, etc.
  • Mathematical or greedy optimization to ensure highest coverage

* The release on bioconda is always associated with waiting times. Therefore, the easiest approach is to install conda dependencies and install eFISHent using pip.

Installation

eFISHent is being tested on MacOS and Linux with Python versions 3.8 - 3.10. Unfortunately, due to the bioinformatics dependencies Windows is not supported. For Windows users, we reccommend installing "Windows Subsystem for Linux (WSL)" (Windows 10, Windows 11) or using a fully fledged Virtual Machine. Using conda environment, install eFISHent as follows:

# Create an environment and install all dependencies (e.g. python)
conda env create bbquercus/efishent  

# Activate environment
conda activate efishent

# Install efishent via pypi
pip install efishent

Any updates can then simply be done via pypi (pip install --upgrade efishent).

Usage

A detailed usage guide can be found on the GitHub wiki but here is a quick example:

eFISHent --reference-genome <reference-genome> --gene-name <gene> --organism-name <organism>

Component overview

eFISHent is built up modularly using the following components...

Index creation workflow:

  • Bowtie index
  • Jellyfish indices

Probe filtering workflow:

  • Download / prepare sequences
  • Generate candidate probes
  • Filter with basic filters
  • Align probes to reference genome
  • Filter based on alignment score and uniqueness
  • Filter reoccuring k-mers
  • Filter based on secondary structure prediction
  • Create final list of probes
  • Write final list of probes to file with report

Probe set analysis plotting:

  • Create a simple overview over the key parameters

TODO

  • Add more detailed documentation as wiki page(s)
    • Add links to genomes and RNAseq databases
    • Add examples from multiple sources
    • Add benchmarks for deltaG, counts
  • Add mathematical description for model (in wiki?)
  • Add probe set analysis txt file with off-target locations / potentially harmful probes