`BioSimplex.biosimplex`

— Method`biosimplex(sequence::NucleicSeqOrView{DNAAlphabet{N}}) -> Matrix{Float64}`

The `biosimplex`

function takes a nucleic acid sequence `sequence`

and returns a matrix of indicators for binary sequences.

**Arguments**

`sequence::NucleicSeqOrView{DNAAlphabet{N}}`

: The input nucleic acid sequence.

**Returns**

`mtx::Matrix{Float64}`

: The matrix of indicators for binary sequences.

The matrix `mtx`

has dimensions 3 x seqlen, where seqlen is the length of the input sequence. Each row of the matrix corresponds to a different indicator: xr, xg, and xb.

The function uses the following precedent methods:

`xr(sequence, i)`

: Returns the indicator for the presence of the nucleotide 'r' at position`i`

in the sequence.`xg(sequence, i)`

: Returns the indicator for the presence of the nucleotide 'g' at position`i`

in the sequence.`xb(sequence, i)`

: Returns the indicator for the presence of the nucleotide 'b' at position`i`

in the sequence.

Those methods are called the RGB indicators of a DNA sequence in a defined tetrahedron space called a simplex. Mathematically, the nucleotides can be represented in a tetrahedron space by adopting a coordinate system with the following basis vectors:

\[\begin{aligned} & A \rightarrow\left(a_r, a_g, a_b\right)=\mathbf{k} \\ & C \rightarrow\left(c_r, c_g, c_b\right)=\frac{-\sqrt{2}}{3} \mathbf{i}+\frac{\sqrt{6}}{3} \mathbf{j}-\frac{1}{3} \mathbf{k} \\ & G \rightarrow\left(g_r, g_g, g_b\right)=\frac{-\sqrt{2}}{3} \mathbf{i}-\frac{\sqrt{6}}{3} \mathbf{j}-\frac{1}{3} \mathbf{k} \\ & T \rightarrow\left(t_r, t_g, t_b\right)=\frac{2 \sqrt{2}}{3} \mathbf{i}-\frac{1}{3} \mathbf{k} \end{aligned}\]

The RGB indicators are then calculated as follows:

\[\begin{aligned} & x_r[n]=\frac{\sqrt{2}}{3}\left(2 u_T[n]-u_C[n]-u_G[n]\right) \\ & x_g[n]=\frac{\sqrt{6}}{3}\left(u_C[n]-u_G[n]\right) \\ & x_b[n]=\frac{1}{3}\left(3 u_A[n]-u_T[n]-u_C[n]-u_G[n]\right) \end{aligned}\]

Finally, the simplex space is a 3D space where the RGB indicators are used to represent the DNA sequences.