Skip to content

comprna/radian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RADIAN

RNA lAnguage informeD decodIng of nAnopore sigNals

Overview

Nanopore direct RNA basecaller that utilises a model of mRNA language.

Since RNA is always sequenced from the 3' to 5' direction, nanopore signals implicitly encode the nucleotide biases in mRNA. This basecaller uses a probabilistic model of human mRNA language to guide basecalling when the signal prediction is ambiguous. The mRNA model is incorporated in a modified CTC beam search decoding algorithm.

Preprint: https://www.biorxiv.org/content/10.1101/2022.10.19.512968v1

RADIAN architecture

Installation

cd <path/to/radian>
pip install --upgrade pip
pip install -r requirements.txt
tar -xvzf radian/models/rnamodel_12mer_pc.tar.gz

Command structure

usage: basecall.py [-h] fast5_dir fasta_dir [--local] [--chunk-len] [--step-size]
                   [--batch-size] [--outlier-clip] [--rna-model]
                   [--sig-model] [--sig-config] [--beam-width]
                   [--decode-type] [--sig-threshold]
                   [--rna-threshold] [--context-len]

positional arguments:
  fast5_dir             Directory of single/multi fast5 files.
  fasta_dir             Directory to output fasta files.

optional arguments:
  -h, --help
  --local
  --chunk-len
  --step-size
  --batch-size
  --outlier-clip
  --rna-model
  --sig-model
  --sig-config
  --beam-width
  --decode-type {global,chunk}
  --sig-threshold
  --rna-threshold
  --context-len

Example usage

We provide a fast5 file containing 5 reads for testing in data/reads.fast5.

To basecall the single or multi-fast5 file(s) in and output fasta to <out_dir>:

cd radian
mkdir out_dir
python3 basecall.py data out_dir