Information
The Information
module of MIToS defines types and functions useful to calculate information measures (e.g. Mutual Information (MI) and Entropy) over a Multiple Sequence Alignment (MSA). This module was designed to count Residue
s (defined in the MSA
module) in special contingency tables (as fast as possible) and to derive probabilities from these counts. Also, includes methods for applying corrections to those tables, e.g. pseudocounts and pseudo frequencies. Finally, Information
allows to use these probabilities and counts to estimate information measures and other frequency based values.
using MIToS.Information # to load the Information module
Features
- Estimate multi dimensional frequencies and probability tables from sequences, MSAs, etc...
- Correction for small number of observations
- Correction for data redundancy on a MSA
- Estimate information measures
- Calculate corrected mutual information between residues
Contents
- Information
Counting residues
MIToS Information module defines a multidimensional ContingencyTable
type and two types wrapping it, Counts
and Probabilities
, to store occurrences or probabilities. The ContingencyTable
type stores the contingency matrix, its marginal values and total. These types are parametric, taking three ordered parameters:
T
: The type used for storing the counts or probabilities, e.g.Float64
. It's
possible to use BigFloat
if more precision it's needed.
N
: It's the dimension of the table and should be anInt
.A
: This should be a type, subtype ofResidueAlphabet
, i.e.:UngappedAlphabet
,
GappedAlphabet
or ReducedAlphabet
.
ContingencyTable
can be used for storing probabilities or counts. The wrapper types Probabilities
and Counts
are mainly intended to dispatch in methods that need to know if the matrix has probabilities or counts, e.g. entropy
. In general, the use of ContingencyTable
is recommended over the use of Probabilities
and Counts
.
In this way, a matrix for storing pairwise probabilities of residues (without gaps) can be initialized using:
using MIToS.Information
Pij = ContingencyTable(Float64, Val{2}, UngappedAlphabet())
MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.UngappedAlphabet} : table : 20×20 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ A R N D C Q … P S T W Y V ──────────────┼────────────────────────────────────────────────────────────── A │ 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 R │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 D │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Q │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 E │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 H │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ K │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 M │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 F │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 S │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 T │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 W │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Y │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 V │ 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 marginals : 20×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── A │ 0.0 0.0 R │ 0.0 0.0 N │ 0.0 0.0 D │ 0.0 0.0 C │ 0.0 0.0 Q │ 0.0 0.0 E │ 0.0 0.0 G │ 0.0 0.0 H │ 0.0 0.0 ⋮ ⋮ ⋮ K │ 0.0 0.0 M │ 0.0 0.0 F │ 0.0 0.0 P │ 0.0 0.0 S │ 0.0 0.0 T │ 0.0 0.0 W │ 0.0 0.0 Y │ 0.0 0.0 V │ 0.0 0.0 total : 0.0
[High level interface] It is possible to use the functions count
and probabilities
to easily calculate the frequencies of sequences or columns of a MSA, where the number of sequences/columns determine the dimension of the resulting table.
using MIToS.Information
using MIToS.MSA # to use res"..." to create Vector{Residue}
column_i = res"AARANHDDRDC-"
column_j = res"-ARRNHADRAVY"
# Nij[R,R] = 1 1 = 2
Nij = count(column_i, column_j)
MIToS.Information.Counts{Float64, 2, MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.UngappedAlphabet} : table : 20×20 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ A R N D C Q … P S T W Y V ──────────────┼────────────────────────────────────────────────────────────── A │ 1.0 1.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 R │ 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N │ 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 D │ 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Q │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 E │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 H │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ K │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 M │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 F │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 S │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 T │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 W │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Y │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 V │ 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 marginals : 20×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── A │ 2.0 3.0 R │ 2.0 3.0 N │ 1.0 1.0 D │ 3.0 1.0 C │ 1.0 0.0 Q │ 0.0 0.0 E │ 0.0 0.0 G │ 0.0 0.0 H │ 1.0 1.0 ⋮ ⋮ ⋮ K │ 0.0 0.0 M │ 0.0 0.0 F │ 0.0 0.0 P │ 0.0 0.0 S │ 0.0 0.0 T │ 0.0 0.0 W │ 0.0 0.0 Y │ 0.0 0.0 V │ 0.0 1.0 total : 10.0
You can use sum
to get the stored total:
sum(Nij) # There are 12 Residues, but 2 are gaps
10.0
Contingency tables can be indexed using Int
or Residue
s:
Nij[2, 2] # Use Int to index the table
2.0
Nij[Residue('R'), Residue('R')] # Use Residue to index the table
2.0
The number makes reference to the specific index in the table e.g [2,2]
references the second row and the second column. The use of the number used to encode the residue to index the table is dangerous. The equivalent index number of a residue depends on the used alphabet and Int(Residue('X'))
will be always out of bounds.
Indexing with Residue
s works as expected. It uses the alphabet of the contingency table to find the index of the Residue
.
using MIToS.Information
using MIToS.MSA
alphabet = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP")
column_i = res"AARANHDDRDC-"
column_j = res"-ARRNHADRAVY"
# Fij[R,R] = 1 1 1 = 3 # RHK
Fij = count(column_i, column_j, alphabet=alphabet)
MIToS.Information.Counts{Float64, 2, MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 NQST │ 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 RHK │ 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 DE │ 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 FWY │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 2.0 4.0 NQST │ 1.0 1.0 RHK │ 3.0 4.0 DE │ 3.0 1.0 FWY │ 0.0 0.0 C │ 1.0 0.0 G │ 0.0 0.0 P │ 0.0 0.0 total : 10.0
Fij[Residue('R'), Residue('R')] # Use Residue to index the table
3.0
The function getcontingencytable
allows to access the wrapped ContingencyTable
in a Counts
object. You can use it, in combination with normalize
to get a contingency table of probabilities. The result can be wrapped inside a Probabilities
object:
Probabilities(normalize(getcontingencytable(Fij)))
MIToS.Information.Probabilities{Float64, 2, MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 NQST │ 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 RHK │ 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 DE │ 0.2 0.0 0.0 0.1 0.0 0.0 0.0 0.0 FWY │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 0.2 0.4 NQST │ 0.1 0.1 RHK │ 0.3 0.4 DE │ 0.3 0.1 FWY │ 0.0 0.0 C │ 0.1 0.0 G │ 0.0 0.0 P │ 0.0 0.0 total : 1.0000000000000002
Example: Plotting the probabilities of each residue in a sequence
Similar to the count
function, the probabilities
function can take at least one sequence (vector of residues) and returns the probabilities of each residue. Optionally, the keyword argument alphabet
could be used to count some residues in the same cell of the table.
probabilities(res"AARANHDDRDC", alphabet=alphabet)
MIToS.Information.Probabilities{Float64, 1, MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 1, MIToS.MSA.ReducedAlphabet} : table : 8-element Named Vector{Float64} Dim_1 │ ───────┼────────── AILMV │ 0.272727 NQST │ 0.0909091 RHK │ 0.272727 DE │ 0.272727 FWY │ 0.0 C │ 0.0909091 G │ 0.0 P │ 0.0 total : 1.0
Here, we are going to use the probabilities
function to get the residue probabilities of a particular sequence from UniProt.
use the getsequence
function, from the MSA
module, to get the sequence from a FASTA
downloaded from UniProt.
julia> using MIToS.Information # to use the probabilities function
julia> using MIToS.MSA # to use getsequence on the one sequence FASTA (canonical) from UniProt
julia> seq = read("http://www.uniprot.org/uniprot/P29374.fasta", FASTA) # Small hack: read the single sequence as a MSA
AnnotatedMultipleSequenceAlignment with 0 annotations : 1×1257 Named Matrix{MIToS.MSA.Residue}
Seq ╲ Col │ …
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────
sp|P29374|ARI4A_HUMAN AT-rich interactive domain-containing protein 4A OS=Homo sapiens OX=9606 GN=ARID4A PE=1 SV=3 │ …
julia> probabilities(seq[1,:]) # Select the single sequence and calculate the probabilities
MIToS.Information.Probabilities{Float64, 1, MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 1, MIToS.MSA.UngappedAlphabet} :
table : 20-element Named Vector{Float64}
Dim_1 │
───────┼───────────
A │ 0.043755
R │ 0.0517104
N │ 0.0469372
D │ 0.0755768
C │ 0.0135243
Q │ 0.035004
E │ 0.134447
G │ 0.043755
H │ 0.0143198
⋮ ⋮
K │ 0.109785
M │ 0.0159109
F │ 0.0190931
P │ 0.0445505
S │ 0.100239
T │ 0.0493238
W │ 0.00636436
Y │ 0.0198886
V │ 0.0517104
total : 1.0
In the previous example, using getsequence(seq,1)
instead of seq[1,:]
will return the sequence as a matrix with a single column to keep information for both dimensions. To use probabilities
(or count
) you can make use of the Julia's vec
function to transform the matrix to a vector, e.g.: probabilities(vec(getsequence(seq,1)))
.
using Plots # We choose Plots because it's intuitive, concise and backend independent
gr(size=(600,300))
Plots.GRBackend()
You can plot together with the probabilities of each residue in a given sequence, the probabilities of each residue estimated with the BLOSUM62 substitution matrix. That matrix is exported as a constant by the Information
module as BLOSUM62_Pi
.
bar(
1:20,
[ frequencies BLOSUM62_Pi ],
lab = [ "Sequence" "BLOSUM62" ],
alpha=0.5
)
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-juliateam' QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setCompositionMode: Painter not active QWidget::paintEngine: Should no longer be called QPainter::begin: Paint device returned engine == 0, type: 1
Low count corrections
Low number of observations can lead to sparse contingency tables, that lead to wrong probability estimations. It is shown in Buslje et. al. 2009 that low-count corrections, can lead to improvements in the contact prediction capabilities of the Mutual Information. The Information module has available two low-count corrections:
- Additive Smoothing
; the constant value pseudocount described in Buslje et. al. 2009
.
- BLOSUM62 based pseudo frequencies of residues pairs, similar to Altschul et. al. 1997
.
using MIToS.MSA
msa = read("http://pfam.xfam.org/family/PF09776/alignment/full", Stockholm)
filtercolumns!(msa, columngapfraction(msa) .< 0.5) # delete columns with 50% gaps or more
column_i = msa[:,1]
column_j = msa[:,2]
386-element Named Vector{MIToS.MSA.Residue} Seq │ ─────────────────────────┼── A0A553R1G6_9TELE/549-644 │ T A0A2K6AF64_MANLE/10-125 │ L G1RVK4_NOMLE/46-161 │ F A0A6A4ILB2_APOLU/2-111 │ - A0A672YV19_9TELE/15-130 │ R A0A444U283_ACIRT/1-75 │ - A0A0V0S2G9_9BILA/731-840 │ - A0A6A5D6Z0_SCHHA/3-104 │ - A0A194PY65_PAPXU/4-116 │ - ⋮ ⋮ A0A0B2V267_TOXCA/6-119 │ P H3DW02_PRIPA/2-115 │ I A0A090MXF3_STRRB/2-104 │ - A0A3B1JKX3_ASTMX/16-130 │ A A0A3P6PBE0_ANISI/14-103 │ - G3WPK1_SARHA/11-126 │ L M3XPJ7_MUSPF/10-125 │ H A0A2K6LYH7_RHIBE/10-125 │ L F4WNV6_ACREC/302-415 │ -
If you have a preallocated ContingencyTable
you can use count!
to fill it, this prevent to create a new table as count
do. However, you should note that count!
adds the new counts to the pre existing values, so in this case, we want to start with a table initialized with zeros.
using MIToS.Information
const alphabet = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP")
Nij = ContingencyTable(Float64, Val{2}, alphabet)
MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NQST │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 RHK │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 DE │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 FWY │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 0.0 0.0 NQST │ 0.0 0.0 RHK │ 0.0 0.0 DE │ 0.0 0.0 FWY │ 0.0 0.0 C │ 0.0 0.0 G │ 0.0 0.0 P │ 0.0 0.0 total : 0.0
# table weights pseudocount sequences...
count!(Nij, NoClustering(), NoPseudocount(), column_i, column_j)
MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 58.0 8.0 9.0 0.0 2.0 5.0 0.0 6.0 NQST │ 36.0 17.0 18.0 0.0 0.0 0.0 2.0 5.0 RHK │ 7.0 5.0 0.0 0.0 0.0 0.0 0.0 4.0 DE │ 0.0 9.0 0.0 0.0 1.0 0.0 0.0 0.0 FWY │ 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 C │ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 G │ 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 P │ 4.0 0.0 0.0 0.0 0.0 1.0 0.0 2.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 88.0 106.0 NQST │ 78.0 41.0 RHK │ 16.0 28.0 DE │ 10.0 0.0 FWY │ 1.0 3.0 C │ 0.0 6.0 G │ 3.0 2.0 P │ 7.0 17.0 total : 203.0
You can use NoClustering()
in places where clustering weights are required to not use weights. Also, NoPseudocount()
in places where pseudocount values are required to not use pseudocounts.
In cases like the above, where there are few observations, it is possible to apply a constant pseudocount to the counting table. This module defines the type AdditiveSmoothing
and the correspond fill!
and apply_pseudocount!
methods to efficiently add or fill with a constant value each element of the table.
apply_pseudocount!(Nij, AdditiveSmoothing(1.0))
MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 59.0 9.0 10.0 1.0 3.0 6.0 1.0 7.0 NQST │ 37.0 18.0 19.0 1.0 1.0 1.0 3.0 6.0 RHK │ 8.0 6.0 1.0 1.0 1.0 1.0 1.0 5.0 DE │ 1.0 10.0 1.0 1.0 2.0 1.0 1.0 1.0 FWY │ 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 C │ 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 G │ 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 P │ 5.0 1.0 1.0 1.0 1.0 2.0 1.0 3.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 96.0 114.0 NQST │ 86.0 49.0 RHK │ 24.0 36.0 DE │ 18.0 8.0 FWY │ 9.0 11.0 C │ 8.0 14.0 G │ 11.0 10.0 P │ 15.0 25.0 total : 267.0
[High level interface.] The count
function has a pseudocounts
keyword argument that can take a AdditiveSmoothing
value to easily calculate occurrences with pseudocounts. Also the alphabet keyword argument can be used to chage the default alphabet (i.e. )
count(column_i, column_j, pseudocounts=AdditiveSmoothing(1.0), alphabet=alphabet)
MIToS.Information.Counts{Float64, 2, MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.ReducedAlphabet} : table : 8×8 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ AILMV NQST RHK DE FWY C G P ──────────────┼─────────────────────────────────────────────────────── AILMV │ 59.0 9.0 10.0 1.0 3.0 6.0 1.0 7.0 NQST │ 37.0 18.0 19.0 1.0 1.0 1.0 3.0 6.0 RHK │ 8.0 6.0 1.0 1.0 1.0 1.0 1.0 5.0 DE │ 1.0 10.0 1.0 1.0 2.0 1.0 1.0 1.0 FWY │ 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 C │ 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 G │ 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 P │ 5.0 1.0 1.0 1.0 1.0 2.0 1.0 3.0 marginals : 8×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────── AILMV │ 96.0 114.0 NQST │ 86.0 49.0 RHK │ 24.0 36.0 DE │ 18.0 8.0 FWY │ 9.0 11.0 C │ 8.0 14.0 G │ 11.0 10.0 P │ 15.0 25.0 total : 267.0
To use the conditional probability matrix BLOSUM62_Pij
in the calculation of pseudo frequencies $G$ for the pair of residues $a$, $b$, it should be calculated first the real frequencies/probabilities $p_{a,b}$. The observed probabilities are then used to estimate the pseudo frequencies.
\[G_{ab} = \sum_{cd} p_{cd} \cdot BLOSUM62( a | c ) \cdot BLOSUM62( b | d )\]
Finally, the probability $P$ of each pair of residues $a$, $b$ between the columns $i$, $j$ is the weighted mean between the observed frequency $p$ and BLOSUM62-based pseudo frequency $G$, where α is generally the number of clusters or the number of sequences of the MSA and β is an empiric weight value. β was determined to be close to 8.512
.
\[P_{ab} = \frac{\alpha \cdot p_{ab} + \beta \cdot G_{ab} }{\alpha + \beta}\]
This could be easily achieved using the pseudofrequencies
keyword argument of the probabilities
function. That argument can take a BLOSUM_Pseudofrequencies
object that is created with α and β as first and second argument, respectively.
Pij = probabilities(column_i, column_j, pseudofrequencies=BLOSUM_Pseudofrequencies(nsequences(msa), 8.512))
MIToS.Information.Probabilities{Float64, 2, MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.UngappedAlphabet} : table : 20×20 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ A R … Y V ──────────────┼────────────────────────────────────────────────────── A │ 0.0436552 0.0146151 … 6.86411e-5 0.024364 R │ 0.00495647 5.63071e-5 2.65647e-5 0.00492551 N │ 8.1845e-5 4.39133e-5 1.727e-5 5.82878e-5 D │ 8.75091e-5 4.46217e-5 1.97191e-5 6.13966e-5 C │ 3.29725e-5 2.02129e-5 7.81597e-6 2.98321e-5 Q │ 0.0821137 5.0501e-5 2.44158e-5 0.0097304 E │ 0.000154837 6.43804e-5 0.00485561 9.53793e-5 G │ 0.00495541 0.00490452 2.89342e-5 0.000102361 H │ 0.00970394 2.19255e-5 9.8944e-6 3.48272e-5 ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ K │ 0.000137569 6.0674e-5 2.71533e-5 9.74759e-5 M │ 4.54269e-5 2.37546e-5 9.43202e-6 3.76855e-5 F │ 4.81705e-5 2.7654e-5 1.05982e-5 4.02543e-5 P │ 7.64427e-5 3.85568e-5 1.88345e-5 0.00972767 S │ 0.00981275 0.0338658 4.43013e-5 0.000136093 T │ 0.00979911 0.00491395 3.58359e-5 0.0194224 W │ 1.83893e-5 8.592e-6 3.56766e-6 1.20098e-5 Y │ 4.43348e-5 2.18765e-5 8.91639e-6 3.31595e-5 V │ 0.000144199 0.0193812 … 3.41784e-5 0.0242418 marginals : 20×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼───────────────────────── A │ 0.325986 0.180578 R │ 0.0687625 0.0783091 N │ 0.000749462 0.00547748 D │ 0.000801326 0.000618927 C │ 0.000349759 0.0295593 Q │ 0.126466 0.00538426 E │ 0.0495117 0.000778742 G │ 0.0157656 0.0106762 H │ 0.0100885 0.0488669 ⋮ ⋮ ⋮ K │ 0.00121226 0.0106448 M │ 0.0100732 0.00537751 F │ 0.000476082 0.0103748 P │ 0.034643 0.0832573 S │ 0.107862 0.131686 T │ 0.146281 0.0590751 W │ 0.00499063 0.000127028 Y │ 0.000401969 0.00530484 V │ 0.0738511 0.0933679 total : 1.0
You can also use apply_pseudofrequencies!
in a previously filled probability contingency table. i.e. apply_pseudofrequencies!(Pij, BLOSUM_Pseudofrequencies(α, β))
BLOSUM_Pseudofrequencies
can be only be applied in normalized/probability tables with UngappedAlphabet
.
Correction for data redundancy in a MSA
A simple way to reduce redundancy in a MSA without losing sequences, is clusterization and sequence weighting. The weight of each sequence should be 1/N, where N is the number of sequences in its cluster. The Clusters
type of the MSA
module stores the weights. This vector of weights can be extracted (with the getweight
function) and used by the count
and probabilities
functions with the keyword argument weights
. Also it's possible to use the Clusters
as second argument of the function count!
.
clusters = hobohmI(msa, 62) # from MIToS.MSA
MIToS.MSA.Clusters([1, 109, 1, 41, 2, 2, 5, 26, 10, 1 … 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 2, 2, 3, 4, 5, 6, 7, 8, 9 … 2, 104, 105, 31, 4, 77, 2, 2, 2, 14], [1.0, 0.009174311926605505, 0.009174311926605505, 1.0, 0.024390243902439025, 0.5, 0.5, 0.2, 0.038461538461538464, 0.1 … 0.009174311926605505, 1.0, 1.0, 0.3333333333333333, 0.024390243902439025, 0.5, 0.009174311926605505, 0.009174311926605505, 0.009174311926605505, 0.09090909090909091])
count(msa[:,1], msa[:,2], weights=clusters)
MIToS.Information.Counts{Float64, 2, MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64, 2, MIToS.MSA.UngappedAlphabet} : table : 20×20 Named Matrix{Float64} Dim_1 ╲ Dim_2 │ A R … Y V ──────────────┼────────────────────────────────────────────────── A │ 2.34878 0.0275229 … 0.0 0.82439 R │ 0.00917431 0.0 0.0 0.0243902 N │ 0.0 0.0 0.0 0.0 D │ 0.0 0.0 0.0 0.0 C │ 0.0 0.0 0.0 0.0 Q │ 1.39024 0.0 0.0 1.03846 E │ 0.0 0.0 0.00917431 0.0 G │ 1.0 0.0243902 0.0 0.0 H │ 1.0 0.0 0.0 0.0 ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ K │ 0.0 0.0 0.0 0.0 M │ 0.0 0.0 0.0 0.0 F │ 0.0 0.0 0.0 0.0 P │ 0.0 0.0 0.0 0.0487805 S │ 2.0 0.0642202 0.0 0.0 T │ 1.0 1.0 0.0 0.5 W │ 0.0 0.0 0.0 0.0 Y │ 0.0 0.0 0.0 0.0 V │ 0.0 0.0366972 … 0.0 0.0458716 marginals : 20×2 Named Matrix{Float64} Residue ╲ Dim │ Dim_1 Dim_2 ──────────────┼─────────────────────── A │ 6.73456 9.1232 R │ 2.67035 1.15283 N │ 0.0 0.333333 D │ 0.0 0.0 C │ 0.0 1.04587 Q │ 4.99595 1.0 E │ 0.582569 0.0 G │ 1.03356 0.0183486 H │ 1.0 0.0917431 ⋮ ⋮ ⋮ K │ 0.0 0.666667 M │ 2.0 0.5 F │ 0.0 0.0183486 P │ 2.22256 1.76009 S │ 5.28965 6.44987 T │ 6.73782 4.05711 W │ 0.00917431 0.0 Y │ 0.0 0.00917431 V │ 1.12844 2.48189 total : 35.112972745535956
Estimating information measures on an MSA
The Information
module has a number of functions defined to calculate information measures from Counts
and Probabilities
:
entropy
: Shannon entropy (H)marginal_entropy
: Shannon entropy (H) of the marginalskullback_leibler
: Kullback-Leibler (KL) divergencemutual_information
: Mutual Information (MI)normalized_mutual_information
: Normalized Mutual Information (nMI) by Entropygap_intersection_percentage
gap_union_percentage
Information measure functions take optionally the base as the last positional argument (default: e
). You can use 2.0
to measure information in bits.
using MIToS.Information
using MIToS.MSA
Ni = count(res"PPCDPPPPPKDKKKKDDGPP") # Ni has the count table of residues in this low complexity sequence
H = entropy(Ni) # returns the Shannon entropy in nats (base e)
1.327362863420189
H = entropy(Ni, 2.0) # returns the Shannon entropy in bits (base 2)
1.9149798205164812
Information module defines special iteration functions to easily and efficiently compute a measure over a MSA. In particular, mapcolfreq!
and mapseqfreq!
map a function that takes a table of Counts
or Probabilities
. The table is filled in place with the counts or probabilities of each column or sequence of a MSA, respectively. mapcolpairfreq!
and mapseqpairfreq!
are similar, but they fill the table using pairs of columns or sequences, respectively.
This functions take three positional arguments: the function f
to be calculated, the msa
and table
of Counts
or Probabilities
.
After that, this function takes some keyword arguments:
weights
(default:NoClustering()
) : Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
) :Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
) :Pseudofrequencies
to be
applied to the normalized (probabilities) table.
mapcolpairfreq!
and mapseqpairfreq!
also have a fourth positional argument usediagonal
that indicates if the function should be applied to identical element pairs (default to Val{true}
). This two functions also have an extra keyword argument diagonalvalue
(default to zero) to indicate the value used to fill the diagonal elements if usediagonal
is Val{false}
.
Example: Estimating H(X) and H(X, Y) over an MSA
In this example, we are going to use mapcolfreq!
and mapcolpairfreq!
to estimate Shannon entropy
of MSA columns H(X) and the joint entropy H(X, Y) of columns pairs, respectively.
using MIToS.MSA
msa = read("http://pfam.xfam.org/family/PF09776/alignment/full", Stockholm)
AnnotatedMultipleSequenceAlignment with 411 annotations : 386×116 Named Matrix{MIToS.MSA.Residue} Seq ╲ Col │ 39 40 41 42 43 … 191 194 195 196 197 ─────────────────────────┼──────────────────────────────────────────────────── A0A553R1G6_9TELE/549-644 │ - - - - - … - - - - - A0A2K6AF64_MANLE/10-125 │ L L R Q S Q F W N - G1RVK4_NOMLE/46-161 │ - L R Q S Q F W T R A0A6A4ILB2_APOLU/2-111 │ - - - - - K L V K - A0A672YV19_9TELE/15-130 │ - - - - - R F W K K A0A444U283_ACIRT/1-75 │ - - - - - K F W K K A0A0V0S2G9_9BILA/731-840 │ - - - - - Y L W K K A0A6A5D6Z0_SCHHA/3-104 │ - - - - - F L L - - A0A194PY65_PAPXU/4-116 │ - - - - - K Y I K K ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ ⋮ A0A0B2V267_TOXCA/6-119 │ - - - - - Q L W K - H3DW02_PRIPA/2-115 │ - - - - - S F W S - A0A090MXF3_STRRB/2-104 │ - - - - - K L W K - A0A3B1JKX3_ASTMX/16-130 │ - L Q E V K F W K K A0A3P6PBE0_ANISI/14-103 │ - - - - - - - - - - G3WPK1_SARHA/11-126 │ - L Q Q N K F W K K M3XPJ7_MUSPF/10-125 │ - L Q Q P R F W T K A0A2K6LYH7_RHIBE/10-125 │ L L R Q S Q F W N - F4WNV6_ACREC/302-415 │ - - - - - … N F N - -
We are going to count residues to estimate the entropy. The entropy
estimation is performed over a rehused Counts
object. The result will be a vector containing the values estimated over each column without counting gaps (UngappedAlphabet
).
using MIToS.Information
Hx = mapcolfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{1}, UngappedAlphabet())))
1×116 Named Matrix{Float64} Function ╲ Col │ 39 40 … 196 197 ───────────────┼────────────────────────────────────────────── entropy │ 0.167026 0.13282 … 1.42931 0.291903
If we want the joint entropy between columns pairs, we need to use a bidimensional table of Counts
and mapcolpairfreq!
.
Hxy = mapcolpairfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))
116×116 Named PairwiseListMatrices.PairwiseListMatrix{Float64, true, Vector{Float64}} Col1 ╲ Col2 │ 39 40 41 … 195 196 197 ────────────┼────────────────────────────────────────────────────────────── 39 │ 0.167026 0.167026 1.05803 … 0.16925 1.34207 0.34878 40 │ 0.167026 0.13282 1.22089 0.136512 1.40086 0.535274 41 │ 1.05803 1.22089 1.21425 1.22652 2.121 1.44626 42 │ 0.547645 0.979877 1.92835 1.03635 1.99952 1.54384 43 │ 1.74324 2.03632 2.5633 2.15465 2.6921 2.35045 44 │ 1.61301 1.6139 2.35935 1.69875 2.48762 1.71034 45 │ 1.22608 1.40381 2.34415 1.58488 2.58066 1.90997 46 │ 1.53769 1.60775 2.3245 1.94138 2.72215 2.14746 48 │ 1.32265 1.46773 2.36984 1.74594 2.54161 1.81194 ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ 187 │ 0.314245 0.466242 1.54739 2.13229 2.52573 1.7452 188 │ 1.28139 1.40474 2.0452 2.71719 2.74963 2.14131 189 │ 0.16925 0.136512 1.22652 0.973534 1.49348 0.342899 190 │ 1.16595 1.46201 2.15734 2.57438 2.84439 2.10267 191 │ 0.799186 0.96076 1.94311 2.37664 2.54461 1.92773 194 │ 0.253446 0.185847 1.268 1.53213 2.11438 1.13293 195 │ 0.16925 0.136512 1.22652 0.91968 2.06058 0.97414 196 │ 1.34207 1.40086 2.121 2.06058 1.42931 1.44691 197 │ 0.34878 0.535274 1.44626 … 0.97414 1.44691 0.291903
In the above examples, we indicate the type of each occurrence in the counting and the probability table to use. Also, it's possible for some measures as entropy and mutual information, to estimate the values only with the count table (without calculate the probability table). Estimating measures only with a ResidueCount
table, when this is possible, should be faster than using a probability table.
Time_Pab = map(1:100) do x
time = @elapsed mapcolpairfreq!(entropy, msa, Probabilities(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))
end
Time_Nab = map(1:100) do x
time = @elapsed mapcolpairfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))
end
using Plots
gr()
histogram( [Time_Pab Time_Nab],
labels = ["Using ResidueProbability" "Using ResidueCount"],
xlabel = "Execution time [seconds]" )
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-juliateam' QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setCompositionMode: Painter not active QWidget::paintEngine: Should no longer be called QPainter::begin: Paint device returned engine == 0, type: 1
Corrected Mutual Information
MIToS ships with two methods to easily calculate corrected mutual information. The first is the algorithm described in Buslje et. al. 2009. This algorithm can be accessed through the
buslje09
function and includes:
- Low count correction using
AdditiveSmoothing
- Sequence weighting after a
hobohmI
clustering - Average Product Correction (APC) proposed by
Dunn et. al. 2008, through the
APC!
function that takes a MI matrix.
- Z score correction using the functions
shuffle!
from the MSA module andzscore
from the PairwiseListMatrices
package.
MIToS.Information.buslje09
— Functionbuslje09
takes a MSA or a file and a FileFormat
as first arguments. It calculates a Z score and a corrected MI/MIp as described on Busjle et. al. 2009.
keyword argument, type, default value and descriptions:
- lambda Float64 0.05 Low count value
- clustering Bool true Sequence clustering (Hobohm I)
- threshold 62 Percent identity threshold for clustering
- maxgap Float64 0.5 Maximum fraction of gaps in positions included in calculation
- apc Bool true Use APC correction (MIp)
- samples Int 100 Number of samples for Z-score
- fixedgaps Bool true Fix gaps positions for the random samples
- alphabet ResidueAlphabet UngappedAlphabet() Residue alphabet to be used
This function returns:
- Z score
- MI or MIp
The second, implemented in the BLMI
function, has the same corrections that the above algorithm, but use BLOSUM62 pseudo frequencies. This function is slower than buslje09
(at the same number of samples), but gives better performance (for structural contact prediction) when the MSA has less than 400 clusters after a Hobohm I at 62% identity.
MIToS.Information.BLMI
— FunctionBLMI
takes a MSA or a file and a FileFormat
as first arguments. It calculates a Z score (ZBLMI) and a corrected MI/MIp as described on Busjle et. al. 2009 but using using BLOSUM62 pseudo frequencies instead of a fixed pseudocount.
Keyword argument, type, default value and descriptions:
- beta Float64 8.512 β for BLOSUM62 pseudo frequencies
- lambda Float64 0.0 Low count value
- threshold 62 Percent identity threshold for sequence clustering (Hobohm I)
- maxgap Float64 0.5 Maximum fraction of gaps in positions included in calculation
- apc Bool true Use APC correction (MIp)
- samples Int 50 Number of samples for Z-score
- fixedgaps Bool true Fix gaps positions for the random samples
This function returns:
- Z score (ZBLMI)
- MI or MIp using BLOSUM62 pseudo frequencies (BLMI/BLMIp)
Example: Estimating corrected MI from an MSA
using MIToS.MSA
using MIToS.Information
msa = read("http://pfam.xfam.org/family/PF16078/alignment/full", Stockholm)
ZMIp, MIp = buslje09(msa)
ZMIp
39×39 Named PairwiseListMatrices.PairwiseListMatrix{Float64, false, Vector{Float64}} Col1 ╲ Col2 │ 14 15 … 59 60 ────────────┼────────────────────────────────────────────────────── 14 │ NaN 1.59121 … -1.70745 1.96031 15 │ 1.59121 NaN -1.20306 3.54214 16 │ 2.96975 1.6647 0.47544 -0.166208 17 │ -0.404107 -1.74711 -2.87247 -1.26098 18 │ -1.30567 -2.26859 5.96648 -2.6569 19 │ -4.39065 0.825759 0.965287 -1.57878 20 │ 0.790773 -4.08707 2.21265 -0.969932 26 │ -4.23719 0.0881876 -0.253538 -4.16803 27 │ -4.11895 -2.99225 -2.12144 -3.55557 ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ 52 │ -1.63996 -0.328589 1.1203 -0.767745 53 │ -2.11681 6.95302 9.37097 -0.0693387 54 │ 3.61968 -2.05306 -2.59109 0.441789 55 │ -0.780507 0.163012 2.79457 0.607155 56 │ -2.5264 -1.60367 8.68859 -0.61965 57 │ 0.270183 -3.2894 6.62548 -0.474177 58 │ -2.42011 7.94816 7.2749 1.69401 59 │ -1.70745 -1.20306 NaN 3.78346 60 │ 1.96031 3.54214 … 3.78346 NaN
ZBLMIp, BLMIp = BLMI(msa)
ZBLMIp
39×39 Named PairwiseListMatrices.PairwiseListMatrix{Float64, false, Vector{Float64}} Col1 ╲ Col2 │ 14 15 … 59 60 ────────────┼────────────────────────────────────────────────────────── 14 │ NaN -0.0361895 … -0.0188949 0.00248148 15 │ -0.0361895 NaN -0.0374084 0.0293007 16 │ -0.0171206 -0.0278213 0.000568627 -0.0247921 17 │ -0.0011799 -0.0398732 -0.0351658 -0.00430076 18 │ -0.023351 -0.0265493 0.0769328 -0.0272612 19 │ -0.0400772 -0.0019086 0.000318475 0.00281241 20 │ 0.00763391 -0.0449152 0.041199 -0.00138397 26 │ -0.030315 -0.00453481 -0.00818949 -0.0391383 27 │ -0.0244834 -0.0394667 -0.0373988 -0.023861 ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ 52 │ 0.00769419 -0.000914643 0.000702067 0.0097406 53 │ -0.0142384 0.100232 0.123109 0.00553947 54 │ 0.0117461 -0.00447789 -0.00554595 -0.0108557 55 │ 0.0102212 0.0137708 0.0320511 0.0206935 56 │ -0.0136997 -0.0168638 0.0897505 -0.00253739 57 │ 0.0102413 -0.0477182 0.0612399 -0.0121978 58 │ -0.00961846 0.114751 0.0550897 0.0260544 59 │ -0.0188949 -0.0374084 NaN 0.0317193 60 │ 0.00248148 0.0293007 … 0.0317193 NaN
Visualize Mutual Information
You can use the function of the Plots
package to visualize the Mutual Information (MI) network between residues. As an example, we are going to visualize the MI between residues of the Pfam domain PF16078. The heatmap
is the simplest way to visualize the values of the Mutual Information matrix.
using Plots
gr()
heatmap(ZMIp, yflip=true)
┌ Warning: Attribute alias `ratio` detected in the user recipe defined for the signature (::NamedArrays.NamedMatrix{Float64, PairwiseListMatrices.PairwiseListMatrix{Float64, false, Vector{Float64}}, Tuple{OrderedCollections.OrderedDict{String, Int64}, OrderedCollections.OrderedDict{String, Int64}}}). To ensure expected behavior it is recommended to use the default attribute `aspect_ratio`. └ @ Plots ~/.julia/packages/Plots/kyYZF/src/pipeline.jl:26 QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-juliateam' QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setCompositionMode: Painter not active QWidget::paintEngine: Should no longer be called QPainter::begin: Paint device returned engine == 0, type: 1
ZMIp is a Z score of the corrected MIp against its distribution on a random MSA (shuffling the residues in each sequence), so pairs with highest values are more likely to co-evolve. Here, we are going to use the top 1% pairs of MSA columns.
using PairwiseListMatrices # to use getlist
using Statistics # to use quantile
threshold = quantile(getlist(ZMIp), 0.99)
10.671195919542905
ZMIp[ ZMIp .< threshold ] .= NaN
heatmap(ZMIp, yflip=true)
┌ Warning: Attribute alias `ratio` detected in the user recipe defined for the signature (::NamedArrays.NamedMatrix{Float64, PairwiseListMatrices.PairwiseListMatrix{Float64, false, Vector{Float64}}, Tuple{OrderedCollections.OrderedDict{String, Int64}, OrderedCollections.OrderedDict{String, Int64}}}). To ensure expected behavior it is recommended to use the default attribute `aspect_ratio`. └ @ Plots ~/.julia/packages/Plots/kyYZF/src/pipeline.jl:26 QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-juliateam' QPainter::begin: Paint device returned engine == 0, type: 3 QPainter::setCompositionMode: Painter not active QWidget::paintEngine: Should no longer be called QPainter::begin: Paint device returned engine == 0, type: 1
We are going to calculate the cMI (cumulative mutual information) value of each node. Where cMI is a mutual information score per position that characterizes the extent of mutual information "interactions" in its neighbourhood. This score is calculated as the sum of MI values above a certain threshold for every amino acid pair where the particular residue appears. This value defines to what degree a given amino acid takes part in a mutual information network and we are going to indicate it using the node color. To calculate cMI we are going to use the cumulative
function:
cMI = cumulative(ZMIp, threshold)
1×39 Named Matrix{Float64} Function ╲ Col2 │ 14 15 16 … 58 59 60 ────────────────┼──────────────────────────────────────────────────────── cumulative │ 0.0 0.0 0.0 … 10.8093 0.0 0.0