isONclust

isONclust is a tool for clustering either PacBio Iso-Seq reads, or Oxford Nanopore reads into clusters, where each cluster represents all reads that came from a gene. Output is a tsv file with each read assigned to a cluster-ID. Detailed information is available in paper.

isONclust is distributed as a python package supported on Linux / OSX with python v>=3.4 as of version 0.0.2 and above (due to updates in python's multiprocessing library). .

@article{sahlin2020a, author = {Sahlin, Kristoffer and Medvedev, Paul}, title = {De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm}, journal = {Journal of Computational Biology}, volume = {27}, number = {4}, pages = {472-484}, year = {2020}, doi = {10.1089/cmb.2019.0299}, note ={PMID: 32181688}, URL = {https://doi.org/10.1089/cmb.2019.0299}, eprint = {https://doi.org/10.1089/cmb.2019.0299}, abstract = { Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets. } }

LICENCE

GPL v3.0, see LICENSE.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
cemetary		cemetary
modules		modules
scripts		scripts
test		test
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
isONclust		isONclust
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

ksahlin/isONclust

Folders and files

Latest commit

History

Repository files navigation

isONclust

Table of Contents

INSTALLATION

Using conda

Using pip

Downloading source from GitHub

Dependencies

Testing installation

USAGE

Oxford Nanopore reads

Iso-Seq reads

Output

Clustering information

Cluster fastq files

CREDITS

Bib record

LICENCE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages