Skip to content

camilogarciabotero/BioVossEncoder.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Encoding biological sequences into Voss representation

Documentation Latest Release DOI
CI Workflow License Work in Progress Downloads Aqua QA


BioVossEncoder

A Julia package for encoding biological sequences into Voss representations

Installation

BioVossEncoder is a   Julia Language   package. To install BioVossEncoder, please open Julia's interactive session (known as REPL) and press ] key in the REPL to use the package mode, then type the following command

pkg> add BioVossEncoder

Encoding BioSequences

This package provides a simple and fast way to encode biological sequences into Voss representations. The main struct provided by this package is VossEncoder which is a wrapper of BitMatrix that encodes a biological sequence into a bit matrix and its corresponding alphabet. The following example shows how to encode a DNA sequence into a Voss matrix.

julia> using BioSequences, BioVossEncoder
julia> seq = dna"ACGT"
julia> VossEncoder(seq)
4×4 Voss Matrix of DNAAlphabet{4}():
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

For simplicity the VossEncoder struct provides a property bitmatrix that returns the BitMatrix representation of the sequence.

julia> VossEncoder(seq).bitmatrix
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Similarly another function that makes use of the VossEncoder structure is vossmatrix which returns the BitMatrix representation of a sequence directly.

julia> vossmatrix(seq)
4×4 BitMatrix:
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1

Creating a Voss vector of a sequence

Sometimes it proves to be useful to encode a sequence into a Voss vector representation (i.e a bit vector of the sequence from the corresponding molecule alphabet).

This package provides a function vossvector that returns a Voss vector of a sequence given a BioSequence and the specific molecule (BioSymbol) that could be DNA or AA.

julia> vossvector(seq, DNA_A)
4-element view(::BitMatrix, 1, :) with eltype Bool:
 1
 0
 0
 0

Note that the output is actually using behind the scenes a view of the BitMatrix representation of the sequence. This is done for performance reasons.