Encoding biological sequences into Voss representation
BioVossEncoder
A Julia package for encoding biological sequences into Voss representation
Installation
BioVossEncoder is a Julia Language package. To install BioVossEncoder, please open Julia's interactive session (known as REPL) and press ] key in the REPL to use the package mode, then type the following command
pkg> add BioVossEncoder
Encoding BioSequences
This package provides a simple and fast way to encode biological sequences into Voss representation. The main struct provided by this package is BinarySequenceMatrix
which is a wrapper of BitMatrix
that encodes a biological sequence into a binary matrix. The following example shows how to encode a DNA sequence into a binary matrix.
julia> using BioSequences, BioVossEncoder
julia> seq = dna"ACGT"
julia> BinarySequenceMatrix(seq)
4×4 BinarySequenceMatrix of DNAAlphabet{4}():
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
For simplicity the BinarySequenceMatrix
struct provides a property bsm
that returns the BitMatrix
representation of the sequence.
julia> BinarySequenceMatrix(seq).bsm
4×4 BitMatrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Similarly another function that makes use of the BinarySequenceMatrix
struct is binary_sequence_matrix
which returns the BitMatrix
representation of a sequence directly.
julia> binary_sequence_matrix(seq)
4×4 BitMatrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Creating a one-hot vector of a sequence
Sometimes it proves to be useful to encode a sequence into a one-hot representation. This package provides a function binaryseq
that returns a one-hot representation of a sequence given a BioSequence
and the specific molecule (BioSymbol
) that could be DNA
or AA
.
julia> binaryseq(seq, DNA_A)
4-element view(::BitMatrix, 1, :) with eltype Bool:
1
0
0
0
Note that the output is actually using behind the scenes a view of the BitMatrix
representation of the sequence. This is done for performance reasons.