Skip to content

tamerh/biobtree

Repository files navigation

Biobtree

Biobtree is a bioinformatics tool which allows mapping the bioinformatics datasets via identifiers and special keywors with simple or advance chain query capability.

Features

  • Datasets - supports wide datasets such as Ensembl Uniprot ChEMBL HMDB Taxonomy GO EFO HGNC ECO Uniparc Uniref with tens of more via cross references by retrieving latest data from providers

  • MapReduce - processes small or large datasets based on users selection and build B+ tree based uniform local database via specialized MapReduce based tecnique with efficient storage usage

  • Query - Allow simple or advance chain queries between datasets with intiutive syntax which allows writing RDF or graph like queries

  • Genome - supports querying full Ensembl genomes coordinates with transcript, CDS, exon, utr with several attiributes, mapped datasets and identifiers such as ortholog ,paralog or probe identifers belongs Affymetrix or Illumina

  • Protein - Uniprot proteins including protein features with variations and mapped datasets.

  • Chemistry - ChEMBL and HMDB datasets supported for chemistry, disease and drug releated analaysis

  • Taxonomy & Ontologies - Taxonomy GO EFO ECO data with mapping to other datasets and child and parent query capability

  • Your data - Your custom data can be integrated with or without relation to other datasets

  • Web UI - Web interface for easy explorations and examples

  • Web Services - REST or gRPC services

  • R & Python - Bioconductor R and Python wrapper packages to use from existing pipelines easier with built-in databases

Usage

First install latest biobtree executable available for Windows, Mac or Linux. Then extract the downloaded file to a new folder and open a terminal in this new folder directory and starts the biobtree. Alternatively R and Python based biobtreeR and biobtreePy wrapper packages can be used instead of using the executable directly for eaiser integration.

Starting biobtree with target datasets or genomes

# build ensembl genomes by tax id with uniprot&taxonomy datasets
biobtree  --tax 595,984254 -d "uniprot,taxonomy" build 

# build datasets only 
biobtree -d "uniprot,taxonomy,hgnc" build 
biobtree -d "hgnc,chembl,hmdb" build

# once data is built start web for using ws and ui
biobtree web

# to see all options and datasets use help
biobtree help

Starting biobtree with built-in databases

# 4 built-in database provided with commonly studied datasets and organism genomes in order to speed up database build process
# Check following func doc for each database content 
# https://github.com/tamerh/biobtreeR/blob/master/R/buildData.R

biobtree --pre-built 1 install
biobtree web

Builting databases updated regularly at least for each Ensembl release and all builtin database files along with configuration files are hosted in spererate github repository

Web service endpoints

# Meta
# datasets meta informations 
localhost:8888/ws/meta

# Search 
# i is the only mandatory parameter
localhost:8888/ws/?i={terms}&s={dataset}&p={page}&f={filter}

# Mapping 
# i and m are mandatory parameters
localhost:8888/ws/map/?i={terms}&m={mapfilter_query}&s={dataset}&p={page}

# Retrieve dataset entry. Both paramters are mandatory
localhost:8888/ws/entry/?i={identifier}&s={dataset}

# Retrieve entry with filtered mapping entries. Only page parameter is optional
localhost:8888/ws/filter/?i={identifier}&s={dataset}&f={filter_datasets}&p={page}

# Retrieve entry results with page index. All the parameters are mandatory 
localhost:8888/ws/page/?i={identifier}&s={dataset}&p={page}&t={total}

Publication

https://f1000research.com/articles/8-145

Building source

biobtree is written with GO for the data processing and Vue.js for the web application part. To build and the create biobtree executable install go>=1.13 and run

go build

To build the web application for development in the web directory run

npm install
npm run serve

To build the web package run

npm run build