BioSimplex.biosimplex
— Methodbiosimplex(sequence::NucleicSeqOrView{DNAAlphabet{N}}) -> Matrix{Float64}
The biosimplex
function takes a nucleic acid sequence sequence
and returns a matrix of indicators for binary sequences.
Arguments
sequence::NucleicSeqOrView{DNAAlphabet{N}}
: The input nucleic acid sequence.
Returns
mtx::Matrix{Float64}
: The matrix of indicators for binary sequences.
The matrix mtx
has dimensions 3 x seqlen, where seqlen is the length of the input sequence. Each row of the matrix corresponds to a different indicator: xr, xg, and xb.
The function uses the following precedent methods:
xr(sequence, i)
: Returns the indicator for the presence of the nucleotide 'r' at positioni
in the sequence.xg(sequence, i)
: Returns the indicator for the presence of the nucleotide 'g' at positioni
in the sequence.xb(sequence, i)
: Returns the indicator for the presence of the nucleotide 'b' at positioni
in the sequence.
Those methods are called the RGB indicators of a DNA sequence in a defined tetrahedron space called a simplex. Mathematically, the nucleotides can be represented in a tetrahedron space by adopting a coordinate system with the following basis vectors:
\[\begin{aligned} & A \rightarrow\left(a_r, a_g, a_b\right)=\mathbf{k} \\ & C \rightarrow\left(c_r, c_g, c_b\right)=\frac{-\sqrt{2}}{3} \mathbf{i}+\frac{\sqrt{6}}{3} \mathbf{j}-\frac{1}{3} \mathbf{k} \\ & G \rightarrow\left(g_r, g_g, g_b\right)=\frac{-\sqrt{2}}{3} \mathbf{i}-\frac{\sqrt{6}}{3} \mathbf{j}-\frac{1}{3} \mathbf{k} \\ & T \rightarrow\left(t_r, t_g, t_b\right)=\frac{2 \sqrt{2}}{3} \mathbf{i}-\frac{1}{3} \mathbf{k} \end{aligned}\]
The RGB indicators are then calculated as follows:
\[\begin{aligned} & x_r[n]=\frac{\sqrt{2}}{3}\left(2 u_T[n]-u_C[n]-u_G[n]\right) \\ & x_g[n]=\frac{\sqrt{6}}{3}\left(u_C[n]-u_G[n]\right) \\ & x_b[n]=\frac{1}{3}\left(3 u_A[n]-u_T[n]-u_C[n]-u_G[n]\right) \end{aligned}\]
Finally, the simplex space is a 3D space where the RGB indicators are used to represent the DNA sequences.