Information

The Information module of MIToS defines types and functions useful to calculate information measures (e.g. Mutual Information (MI) and Entropy) over a Multiple Sequence Alignment (MSA). This module was designed to count Residues (defined in the MSA module) in special contingency tables (as fast as possible) and to derive probabilities from these counts. Also, includes methods for applying corrections to those tables, e.g. pseudocounts and pseudo frequencies. Finally, Information allows to use these probabilities and counts to estimate information measures and other frequency based values.

using MIToS.Information # to load the Information module

Features

Estimate multi dimensional frequencies and probability tables from sequences, MSAs, etc...
Correction for small number of observations
Correction for data redundancy on a MSA
Estimate information measures
Calculate corrected mutual information between residues

Information

Features
Contents
Counting residues

Example: Plotting the probabilities of each residue in a sequence

Low count corrections
Correction for data redundancy in a MSA
Estimating information measures on an MSA

Example: Estimating H(X) and H(X, Y) over an MSA

Corrected Mutual Information

Example: Estimating corrected MI from an MSA

Visualize Mutual Information

Counting residues

MIToS Information module defines a multidimensional ContingencyTable type and two types wrapping it, Counts and Probabilities, to store occurrences or probabilities. The ContingencyTable type stores the contingency matrix, its marginal values and total. These types are parametric, taking three ordered parameters:

T : The type used for storing the counts or probabilities, e.g. Float64. It's

possible to use BigFloat if more precision it's needed.

N : It's the dimension of the table and should be an Int.
A : This should be a type, subtype of ResidueAlphabet, i.e.: UngappedAlphabet,

GappedAlphabet or ReducedAlphabet.

Note

ContingencyTable can be used for storing probabilities or counts. The wrapper types Probabilities and Counts are mainly intended to dispatch in methods that need to know if the matrix has probabilities or counts, e.g. entropy. In general, the use of ContingencyTable is recommended over the use of Probabilities and Counts.

In this way, a matrix for storing pairwise probabilities of residues (without gaps) can be initialized using:

using MIToS.Information

Pij = ContingencyTable(Float64, Val{2}, UngappedAlphabet())

MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.UngappedAlphabet} : 

table : 20×20 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │   A    R    N    D    C    Q  …    P    S    T    W    Y    V
──────────────┼──────────────────────────────────────────────────────────────
A             │ 0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0
R             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
N             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
D             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
C             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
Q             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
E             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
G             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
H             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
⋮                 ⋮    ⋮    ⋮    ⋮    ⋮    ⋮  ⋱    ⋮    ⋮    ⋮    ⋮    ⋮    ⋮
K             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
M             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
F             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
P             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
S             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
T             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
W             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
Y             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
V             │ 0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0

marginals : 20×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
A             │   0.0    0.0
R             │   0.0    0.0
N             │   0.0    0.0
D             │   0.0    0.0
C             │   0.0    0.0
Q             │   0.0    0.0
E             │   0.0    0.0
G             │   0.0    0.0
H             │   0.0    0.0
⋮                   ⋮      ⋮
K             │   0.0    0.0
M             │   0.0    0.0
F             │   0.0    0.0
P             │   0.0    0.0
S             │   0.0    0.0
T             │   0.0    0.0
W             │   0.0    0.0
Y             │   0.0    0.0
V             │   0.0    0.0

total : 0.0

[High level interface] It is possible to use the functions count and probabilities to easily calculate the frequencies of sequences or columns of a MSA, where the number of sequences/columns determine the dimension of the resulting table.

using MIToS.Information
using MIToS.MSA # to use res"..." to create Vector{Residue}

column_i = res"AARANHDDRDC-"
column_j = res"-ARRNHADRAVY"
#   Nij[R,R] =   1     1   = 2

Nij = count(column_i, column_j)

MIToS.Information.Counts{Float64,2,MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.UngappedAlphabet} : 

table : 20×20 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │   A    R    N    D    C    Q  …    P    S    T    W    Y    V
──────────────┼──────────────────────────────────────────────────────────────
A             │ 1.0  1.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0
R             │ 0.0  2.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
N             │ 0.0  0.0  1.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
D             │ 2.0  0.0  0.0  1.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
C             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  1.0
Q             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
E             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
G             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
H             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
⋮                 ⋮    ⋮    ⋮    ⋮    ⋮    ⋮  ⋱    ⋮    ⋮    ⋮    ⋮    ⋮    ⋮
K             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
M             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
F             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
P             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
S             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
T             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
W             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
Y             │ 0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0
V             │ 0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0

marginals : 20×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
A             │   2.0    3.0
R             │   2.0    3.0
N             │   1.0    1.0
D             │   3.0    1.0
C             │   1.0    0.0
Q             │   0.0    0.0
E             │   0.0    0.0
G             │   0.0    0.0
H             │   1.0    1.0
⋮                   ⋮      ⋮
K             │   0.0    0.0
M             │   0.0    0.0
F             │   0.0    0.0
P             │   0.0    0.0
S             │   0.0    0.0
T             │   0.0    0.0
W             │   0.0    0.0
Y             │   0.0    0.0
V             │   0.0    1.0

total : 10.0

You can use sum to get the stored total:

sum(Nij) # There are 12 Residues, but 2 are gaps

10.0

Contingency tables can be indexed using Int or Residues:

Nij[2, 2] # Use Int to index the table

2.0

Nij[Residue('R'), Residue('R')] # Use Residue to index the table

2.0

Warning

The number makes reference to the specific index in the table e.g [2,2] references the second row and the second column. The use of the number used to encode the residue to index the table is dangerous. The equivalent index number of a residue depends on the used alphabet and Int(Residue('X')) will be always out of bounds.

Indexing with Residues works as expected. It uses the alphabet of the contingency table to find the index of the Residue.

using MIToS.Information
using MIToS.MSA

alphabet = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP")

column_i = res"AARANHDDRDC-"
column_j = res"-ARRNHADRAVY"
#   Fij[R,R] =   1  1  1   = 3 # RHK

Fij = count(column_i, column_j, alphabet=alphabet)

MIToS.Information.Counts{Float64,2,MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   1.0    0.0    1.0    0.0    0.0    0.0    0.0    0.0
NQST          │   0.0    1.0    0.0    0.0    0.0    0.0    0.0    0.0
RHK           │   0.0    0.0    3.0    0.0    0.0    0.0    0.0    0.0
DE            │   2.0    0.0    0.0    1.0    0.0    0.0    0.0    0.0
FWY           │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
C             │   1.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
G             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
P             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │   2.0    4.0
NQST          │   1.0    1.0
RHK           │   3.0    4.0
DE            │   3.0    1.0
FWY           │   0.0    0.0
C             │   1.0    0.0
G             │   0.0    0.0
P             │   0.0    0.0

total : 10.0

Fij[Residue('R'), Residue('R')] # Use Residue to index the table

3.0

The function getcontingencytable allows to access the wrapped ContingencyTable in a Counts object. You can use it, in combination with normalize to get a contingency table of probabilities. The result can be wrapped inside a Probabilities object:

Probabilities(normalize(getcontingencytable(Fij)))

MIToS.Information.Probabilities{Float64,2,MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   0.1    0.0    0.1    0.0    0.0    0.0    0.0    0.0
NQST          │   0.0    0.1    0.0    0.0    0.0    0.0    0.0    0.0
RHK           │   0.0    0.0    0.3    0.0    0.0    0.0    0.0    0.0
DE            │   0.2    0.0    0.0    0.1    0.0    0.0    0.0    0.0
FWY           │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
C             │   0.1    0.0    0.0    0.0    0.0    0.0    0.0    0.0
G             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
P             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │   0.2    0.4
NQST          │   0.1    0.1
RHK           │   0.3    0.4
DE            │   0.3    0.1
FWY           │   0.0    0.0
C             │   0.1    0.0
G             │   0.0    0.0
P             │   0.0    0.0

total : 1.0000000000000002

Example: Plotting the probabilities of each residue in a sequence

Similar to the count function, the probabilities function can take at least one sequence (vector of residues) and returns the probabilities of each residue. Optionally, the keyword argument alphabet could be used to count some residues in the same cell of the table.

probabilities(res"AARANHDDRDC", alphabet=alphabet)

MIToS.Information.Probabilities{Float64,1,MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,1,MIToS.MSA.ReducedAlphabet} : 

table : 8-element Named Array{Float64,1}
Dim_1  │ 
───────┼──────────
AILMV  │  0.272727
NQST   │ 0.0909091
RHK    │  0.272727
DE     │  0.272727
FWY    │       0.0
C      │ 0.0909091
G      │       0.0
P      │       0.0

total : 1.0

Here, we are going to use the probabilities function to get the residue probabilities of a particular sequence from UniProt.

use the getsequence function, from the MSA module, to get the sequence from a FASTA downloaded from UniProt.

julia> using MIToS.Information # to use the probabilities function

julia> using MIToS.MSA # to use getsequence on the one sequence FASTA (canonical) from UniProt

julia> seq = read("http://www.uniprot.org/uniprot/P29374.fasta", FASTA) # Small hack: read the single sequence as a MSA
AnnotatedMultipleSequenceAlignment with 0 annotations : 1×1257 Named Array{MIToS.MSA.Residue,2}
                                                                                                         Seq ╲ Col │   …
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────
sp|P29374|ARI4A_HUMAN AT-rich interactive domain-containing protein 4A OS=Homo sapiens OX=9606 GN=ARID4A PE=1 SV=3 │   …

julia> probabilities(seq[1,:]) # Select the single sequence and calculate the probabilities
MIToS.Information.Probabilities{Float64,1,MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,1,MIToS.MSA.UngappedAlphabet} :

table : 20-element Named Array{Float64,1}
Dim_1  │
───────┼───────────
A      │   0.043755
R      │  0.0517104
N      │  0.0469372
D      │  0.0755768
C      │  0.0135243
Q      │   0.035004
E      │   0.134447
G      │   0.043755
H      │  0.0143198
⋮                 ⋮
K      │   0.109785
M      │  0.0159109
F      │  0.0190931
P      │  0.0445505
S      │   0.100239
T      │  0.0493238
W      │ 0.00636436
Y      │  0.0198886
V      │  0.0517104

total : 1.0

Note

In the previous example, using getsequence(seq,1) instead of seq[1,:] will return the sequence as a matrix with a single column to keep information for both dimensions. To use probabilities (or count) you can make use of the Julia's vec function to transform the matrix to a vector, e.g.: probabilities(vec(getsequence(seq,1))).

using Plots # We choose Plots because it's intuitive, concise and backend independent
gr(size=(600,300))

Plots.GRBackend()

You can plot together with the probabilities of each residue in a given sequence, the probabilities of each residue estimated with the BLOSUM62 substitution matrix. That matrix is exported as a constant by the Information module as BLOSUM62_Pi.

bar(
    1:20,
    [ frequencies  BLOSUM62_Pi ],
    lab = [ "Sequence"  "BLOSUM62"   ],
    alpha=0.5
    )

/juliateam/.julia/packages/GR/yMV3y/src/../deps/gr/bin/gksqt: error while loading shared libraries: libQt5Widgets.so.5: cannot open shared object file: No such file or directory
connect: Connection refused
GKS: can't connect to GKS socket application

GKS: Open failed in routine OPEN_WS
GKS: GKS not in proper state. GKS must be either in the state WSOP or WSAC in routine ACTIVATE_WS

Low count corrections

Low number of observations can lead to sparse contingency tables, that lead to wrong probability estimations. It is shown in Buslje et. al. 2009 that low-count corrections, can lead to improvements in the contact prediction capabilities of the Mutual Information. The Information module has available two low-count corrections:

Additive Smoothing; the constant value pseudocount described in Buslje et. al. 2009.
BLOSUM62 based pseudo frequencies of residues pairs, similar to Altschul et. al. 1997.

using MIToS.MSA

msa = read("http://pfam.xfam.org/family/PF09776/alignment/full", Stockholm)

filtercolumns!(msa, columngapfraction(msa) .< 0.5) # delete columns with 50% gaps or more

column_i = msa[:,1]
column_j = msa[:,2]

251-element Named Array{MIToS.MSA.Residue,1}
Seq                        │ 
───────────────────────────┼──
A0A2A2LU69_9BILA/10-118    │ Q
G1RVK4_NOMLE/46-161        │ R
A0A194PY65_PAPXU/3-116     │ -
A0A0V0S2G9_9BILA/731-840   │ S
A0A016T9R4_9BILA/3-89      │ -
A0A1V4KDS6_PATFA/159-270   │ -
M4A0G1_XIPMA/29-139        │ F
A0A1B0BJN8_9MUSC/16-132    │ -
A0A1I8N4P6_MUSDO/6-107     │ -
⋮                            ⋮
A0A0M3QXL3_DROBS/2-105     │ -
A0A195FWH0_9HYME/356-468   │ -
A0A182I7Y1_ANOAR/4-112     │ -
T1FXF3_HELRO/1-90          │ -
A0A2I3GIP3_NOMLE/10-125    │ R
G3WPK1_SARHA/11-126        │ H
RM55_MOUSE/9-124           │ H
M3XPJ7_MUSPF/10-125        │ R
A0A182GHC0_AEDAL/6-119     │ -

If you have a preallocated ContingencyTable you can use count! to fill it, this prevent to create a new table as count do. However, you should note that count!adds the new counts to the pre existing values, so in this case, we want to start with a table initialized with zeros.

using MIToS.Information

const alphabet = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP")

Nij = ContingencyTable(Float64, Val{2}, alphabet)

MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
NQST          │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
RHK           │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
DE            │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
FWY           │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
C             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
G             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
P             │   0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │   0.0    0.0
NQST          │   0.0    0.0
RHK           │   0.0    0.0
DE            │   0.0    0.0
FWY           │   0.0    0.0
C             │   0.0    0.0
G             │   0.0    0.0
P             │   0.0    0.0

total : 0.0

#      table  weights         pseudocount      sequences...
count!(Nij,   NoClustering(), NoPseudocount(), column_i, column_j)

MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   0.0    2.0    1.0    0.0    0.0    0.0    2.0    1.0
NQST          │   4.0   17.0    2.0    0.0    3.0    5.0    1.0    0.0
RHK           │   2.0    8.0   27.0    0.0    9.0   12.0    1.0    0.0
DE            │   2.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0
FWY           │   0.0    2.0    1.0    0.0    1.0    0.0    0.0    0.0
C             │   0.0   10.0   10.0    0.0    2.0    3.0    0.0    0.0
G             │   0.0    1.0    0.0    0.0    0.0    0.0    0.0    0.0
P             │   1.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │   6.0    9.0
NQST          │  32.0   40.0
RHK           │  59.0   41.0
DE            │   2.0    0.0
FWY           │   4.0   15.0
C             │  25.0   20.0
G             │   1.0    4.0
P             │   1.0    1.0

total : 130.0

Note

You can use NoClustering() in places where clustering weights are required to not use weights. Also, NoPseudocount() in places where pseudocount values are required to not use pseudocounts.

In cases like the above, where there are few observations, it is possible to apply a constant pseudocount to the counting table. This module defines the type AdditiveSmoothing and the correspond fill! and apply_pseudocount! methods to efficiently add or fill with a constant value each element of the table.

apply_pseudocount!(Nij, AdditiveSmoothing(1.0))

MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   1.0    3.0    2.0    1.0    1.0    1.0    3.0    2.0
NQST          │   5.0   18.0    3.0    1.0    4.0    6.0    2.0    1.0
RHK           │   3.0    9.0   28.0    1.0   10.0   13.0    2.0    1.0
DE            │   3.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
FWY           │   1.0    3.0    2.0    1.0    2.0    1.0    1.0    1.0
C             │   1.0   11.0   11.0    1.0    3.0    4.0    1.0    1.0
G             │   1.0    2.0    1.0    1.0    1.0    1.0    1.0    1.0
P             │   2.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │  14.0   17.0
NQST          │  40.0   48.0
RHK           │  67.0   49.0
DE            │  10.0    8.0
FWY           │  12.0   23.0
C             │  33.0   28.0
G             │   9.0   12.0
P             │   9.0    9.0

total : 194.0

[High level interface.] The count function has a pseudocounts keyword argument that can take a AdditiveSmoothing value to easily calculate occurrences with pseudocounts. Also the alphabet keyword argument can be used to chage the default alphabet (i.e. )

count(column_i, column_j, pseudocounts=AdditiveSmoothing(1.0), alphabet=alphabet)

MIToS.Information.Counts{Float64,2,MIToS.MSA.ReducedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.ReducedAlphabet} : 

table : 8×8 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │ AILMV   NQST    RHK     DE    FWY      C      G      P
──────────────┼───────────────────────────────────────────────────────
AILMV         │   1.0    3.0    2.0    1.0    1.0    1.0    3.0    2.0
NQST          │   5.0   18.0    3.0    1.0    4.0    6.0    2.0    1.0
RHK           │   3.0    9.0   28.0    1.0   10.0   13.0    2.0    1.0
DE            │   3.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0
FWY           │   1.0    3.0    2.0    1.0    2.0    1.0    1.0    1.0
C             │   1.0   11.0   11.0    1.0    3.0    4.0    1.0    1.0
G             │   1.0    2.0    1.0    1.0    1.0    1.0    1.0    1.0
P             │   2.0    1.0    1.0    1.0    1.0    1.0    1.0    1.0

marginals : 8×2 Named Array{Float64,2}
Residue ╲ Dim │ Dim_1  Dim_2
──────────────┼─────────────
AILMV         │  14.0   17.0
NQST          │  40.0   48.0
RHK           │  67.0   49.0
DE            │  10.0    8.0
FWY           │  12.0   23.0
C             │  33.0   28.0
G             │   9.0   12.0
P             │   9.0    9.0

total : 194.0

To use the conditional probability matrix BLOSUM62_Pij in the calculation of pseudo frequencies $G$ for the pair of residues $a$, $b$, it should be calculated first the real frequencies/probabilities $p_{a,b}$. The observed probabilities are then used to estimate the pseudo frequencies.

\[G_{ab} = \sum_{cd} p_{cd} \cdot BLOSUM62( a | c ) \cdot BLOSUM62( b | d )\]

Finally, the probability $P$ of each pair of residues $a$, $b$ between the columns $i$, $j$ is the weighted mean between the observed frequency $p$ and BLOSUM62-based pseudo frequency $G$, where α is generally the number of clusters or the number of sequences of the MSA and β is an empiric weight value. β was determined to be close to 8.512.

\[P_{ab} = \frac{\alpha \cdot p_{ab} + \beta \cdot G_{ab} }{\alpha + \beta}\]

This could be easily achieved using the pseudofrequencies keyword argument of the probabilities function. That argument can take a BLOSUM_Pseudofrequencies object that is created with α and β as first and second argument, respectively.

Pij = probabilities(column_i, column_j, pseudofrequencies=BLOSUM_Pseudofrequencies(nsequences(msa), 8.512))

MIToS.Information.Probabilities{Float64,2,MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.UngappedAlphabet} : 

table : 20×20 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │           A            R  …            Y            V
──────────────┼──────────────────────────────────────────────────────
A             │ 0.000142908    0.0075939  …   7.76763e-5   9.95548e-5
R             │  0.00772558    0.0673799       0.0225557  0.000202234
N             │  9.64158e-5   8.71683e-5      5.03437e-5    7.1042e-5
D             │  8.00906e-5   7.40001e-5      4.49757e-5   8.74722e-5
C             │ 0.000224617     0.015097      0.00758641  0.000126697
Q             │  9.35478e-5   9.14959e-5      5.41118e-5   0.00752563
E             │  9.80385e-5  0.000101904      6.07924e-5   8.42353e-5
G             │ 0.000101064   9.44132e-5      5.22545e-5   7.04391e-5
H             │  5.59106e-5   5.59221e-5      4.09321e-5   4.13428e-5
⋮                         ⋮            ⋮  ⋱            ⋮            ⋮
K             │ 0.000168605  0.000191384     0.000125256  0.000140848
M             │  3.57874e-5   3.61969e-5      2.14301e-5    2.5536e-5
F             │  6.65783e-5   5.31368e-5      3.81743e-5   4.26134e-5
P             │  5.02375e-5   4.96049e-5      2.97028e-5   4.56639e-5
S             │ 0.000160719    0.0150369      8.31195e-5  0.000117318
T             │ 0.000102889  0.000103082      5.60785e-5   7.50731e-5
W             │  1.21225e-5   1.24975e-5      7.72719e-6   8.65093e-6
Y             │  4.44681e-5   4.35146e-5      2.99456e-5   3.11897e-5
V             │  9.45992e-5   9.21482e-5  …   5.29008e-5    6.3349e-5

marginals : 20×2 Named Array{Float64,2}
Residue ╲ Dim │       Dim_1        Dim_2
──────────────┼─────────────────────────
A             │   0.0244776   0.00955496
R             │    0.347106     0.106384
N             │   0.0237304   0.00124081
D             │   0.0161538   0.00106531
C             │    0.189336     0.151575
Q             │   0.0535322    0.0458474
E             │  0.00155238   0.00137584
G             │  0.00889809    0.0315651
H             │   0.0307239     0.203753
⋮                         ⋮            ⋮
K             │    0.069772   0.00161526
M             │ 0.000546285   0.00807655
F             │   0.0232668    0.0762266
P             │  0.00825598    0.0082319
S             │    0.143734     0.188571
T             │   0.0238898    0.0688152
W             │ 0.000188462   0.00780421
Y             │   0.0081499      0.03108
V             │  0.00139872   0.00899212

total : 1.0

You can also use apply_pseudofrequencies! in a previously filled probability contingency table. i.e. apply_pseudofrequencies!(Pij, BLOSUM_Pseudofrequencies(α, β))

Warning

BLOSUM_Pseudofrequencies can be only be applied in normalized/probability tables with UngappedAlphabet.

Correction for data redundancy in a MSA

A simple way to reduce redundancy in a MSA without losing sequences, is clusterization and sequence weighting. The weight of each sequence should be 1/N, where N is the number of sequences in its cluster. The Clusters type of the MSA module stores the weights. This vector of weights can be extracted (with the getweight function) and used by the count and probabilities functions with the keyword argument weights. Also it's possible to use the Clusters as second argument of the function count!.

clusters = hobohmI(msa, 62) # from MIToS.MSA

MIToS.MSA.Clusters([2, 53, 37, 2, 8, 5, 7, 3, 1, 3  …  1, 1, 1, 1, 2, 2, 1, 1, 2, 1], [1, 2, 3, 4, 5, 6, 7, 8, 3, 2  …  15, 3, 19, 3, 77, 2, 2, 2, 2, 3], [0.5, 0.018867924528301886, 0.02702702702702703, 0.5, 0.125, 0.2, 0.14285714285714285, 0.3333333333333333, 0.02702702702702703, 0.018867924528301886  …  0.25, 0.02702702702702703, 0.09090909090909091, 0.02702702702702703, 1.0, 0.018867924528301886, 0.018867924528301886, 0.018867924528301886, 0.018867924528301886, 0.02702702702702703])

count(msa[:,1], msa[:,2], weights=clusters)

MIToS.Information.Counts{Float64,2,MIToS.MSA.UngappedAlphabet} wrapping a MIToS.Information.ContingencyTable{Float64,2,MIToS.MSA.UngappedAlphabet} : 

table : 20×20 Named Array{Float64,2}
Dim_1 ╲ Dim_2 │         A          R  …          Y          V
──────────────┼──────────────────────────────────────────────
A             │       0.0  0.0188679  …        0.0        0.0
R             │       0.2    1.63208      0.537736        0.0
N             │       0.0        0.0           0.0        0.0
D             │       0.0        0.0           0.0        0.0
C             │       0.0   0.143868         0.125        0.0
Q             │       0.0        0.0           0.0        0.5
E             │       0.0        0.0           0.0        0.0
G             │       0.0        0.0           0.0        0.0
H             │       0.0        0.0           0.0        0.0
⋮                       ⋮          ⋮  ⋱          ⋮          ⋮
K             │       0.0        0.0           0.0        0.0
M             │       0.0        0.0           0.0        0.0
F             │       0.0        0.0           0.0        0.0
P             │       0.0        0.0           0.0        0.0
S             │       0.0   0.333333           0.0        0.0
T             │       0.0        0.0           0.0        0.0
W             │       0.0        0.0           0.0        0.0
Y             │       0.0        0.0           0.0        0.0
V             │       0.0        0.0  …        0.0        0.0

marginals : 20×2 Named Array{Float64,2}
Residue ╲ Dim │     Dim_1      Dim_2
──────────────┼─────────────────────
A             │   1.16173        0.2
R             │   7.61854    2.12814
N             │       1.5        0.0
D             │  0.666667        0.0
C             │   3.77201    3.66588
Q             │       4.5    3.27703
E             │       0.0        0.0
G             │ 0.0909091    1.41173
H             │ 0.0754717   0.721698
⋮                       ⋮          ⋮
K             │   1.92063        0.0
M             │       0.0        1.0
F             │       3.0    3.88792
P             │       0.2        0.5
S             │   6.38196     9.7943
T             │     1.125    4.44444
W             │       0.0  0.0188679
Y             │ 0.0188679   0.662736
V             │       0.0        0.5

total : 33.86512090380015

Estimating information measures on an MSA

The Information module has a number of functions defined to calculate information measures from Counts and Probabilities:

entropy : Shannon entropy (H)
marginal_entropy : Shannon entropy (H) of the marginals
kullback_leibler : Kullback-Leibler (KL) divergence
mutual_information : Mutual Information (MI)
normalized_mutual_information : Normalized Mutual Information (nMI) by Entropy
gap_intersection_percentage
gap_union_percentage

Information measure functions take optionally the base as the last positional argument (default: e). You can use 2.0 to measure information in bits.

using MIToS.Information
using MIToS.MSA

Ni = count(res"PPCDPPPPPKDKKKKDDGPP") # Ni has the count table of residues in this low complexity sequence

H = entropy(Ni) # returns the Shannon entropy in nats (base e)

1.327362863420189

H = entropy(Ni, 2.0) # returns the Shannon entropy in bits (base 2)

1.9149798205164812

Information module defines special iteration functions to easily and efficiently compute a measure over a MSA. In particular, mapcolfreq! and mapseqfreq! map a function that takes a table of Counts or Probabilities. The table is filled in place with the counts or probabilities of each column or sequence of a MSA, respectively. mapcolpairfreq! and mapseqpairfreq! are similar, but they fill the table using pairs of columns or sequences, respectively.

This functions take three positional arguments: the function f to be calculated, the msa and table of Counts or Probabilities.

After that, this function takes some keyword arguments:

weights (default: NoClustering()) : Weights to be used for table counting.
pseudocounts (default: NoPseudocount()) : Pseudocount object to be applied to table.
pseudofrequencies (default: NoPseudofrequencies()) : Pseudofrequencies to be

applied to the normalized (probabilities) table.

mapcolpairfreq! and mapseqpairfreq! also have a fourth positional argument usediagonal that indicates if the function should be applied to identical element pairs (default to Val{true}). This two functions also have an extra keyword argument diagonalvalue (default to zero) to indicate the value used to fill the diagonal elements if usediagonal is Val{false}.

Example: Estimating H(X) and H(X, Y) over an MSA

In this example, we are going to use mapcolfreq! and mapcolpairfreq! to estimate Shannon entropy of MSA columns H(X) and the joint entropy H(X, Y) of columns pairs, respectively.

using MIToS.MSA

msa = read("http://pfam.xfam.org/family/PF09776/alignment/full", Stockholm)

AnnotatedMultipleSequenceAlignment with 275 annotations : 251×116 Named Array{MIToS.MSA.Residue,2}
                 Seq ╲ Col │  35   36   37   38   39  …  183  184  185  186  187
───────────────────────────┼────────────────────────────────────────────────────
A0A2A2LU69_9BILA/10-118    │   -    -    -    -    -  …    K    L    -    -    -
G1RVK4_NOMLE/46-161        │   -    L    R    Q    S       Q    F    W    T    R
A0A194PY65_PAPXU/3-116     │   -    -    -    -    -       K    Y    I    K    K
A0A0V0S2G9_9BILA/731-840   │   -    -    -    -    -       Y    L    W    K    K
A0A016T9R4_9BILA/3-89      │   -    -    -    -    -       -    -    -    -    -
A0A1V4KDS6_PATFA/159-270   │   -    -    -    -    -       K    F    W    K    K
M4A0G1_XIPMA/29-139        │   -    -    -    -    -       H    L    W    K    K
A0A1B0BJN8_9MUSC/16-132    │   -    -    -    -    -       K    Y    I    K    K
A0A1I8N4P6_MUSDO/6-107     │   -    -    -    -    -       K    Y    I    K    K
⋮                              ⋮    ⋮    ⋮    ⋮    ⋮  ⋱    ⋮    ⋮    ⋮    ⋮    ⋮
A0A0M3QXL3_DROBS/2-105     │   -    -    -    -    -       K    Y    I    K    K
A0A195FWH0_9HYME/356-468   │   -    -    -    -    -       N    F    K    -    -
A0A182I7Y1_ANOAR/4-112     │   -    -    -    -    -       K    Y    M    K    K
T1FXF3_HELRO/1-90          │   -    -    -    -    -       Y    L    W    K    K
A0A2I3GIP3_NOMLE/10-125    │   -    L    R    Q    S       Q    F    W    T    R
G3WPK1_SARHA/11-126        │   -    L    Q    Q    N       K    F    W    K    K
RM55_MOUSE/9-124           │   L    L    R    H    C       Q    F    W    T    K
M3XPJ7_MUSPF/10-125        │   -    L    Q    Q    P       R    F    W    T    K
A0A182GHC0_AEDAL/6-119     │   -    -    -    -    -  …    N    F    I    R    K

We are going to count residues to estimate the entropy. The entropy estimation is performed over a rehused Counts object. The result will be a vector containing the values estimated over each column without counting gaps (UngappedAlphabet).

using MIToS.Information

Hx = mapcolfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{1}, UngappedAlphabet())))

1×116 Named Array{Float64,2}
Function ╲ Col │        35         36  …        186        187
───────────────┼──────────────────────────────────────────────
entropy        │  0.223718   0.152005  …    1.40053   0.392674

If we want the joint entropy between columns pairs, we need to use a bidimensional table of Counts and mapcolpairfreq!.

Hxy = mapcolpairfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))

116×116 Named PairwiseListMatrices.PairwiseListMatrix{Float64,true,Array{Float64,1}}
Col1 ╲ Col2 │        35         36  …        186        187
────────────┼──────────────────────────────────────────────
35          │  0.223718   0.223718  …    1.32844   0.636514
36          │  0.223718   0.152005       1.31991   0.665504
37          │   1.17117    1.16219       2.17753    1.40666
38          │  0.644482   0.783024       1.91145    1.38292
39          │   1.65034    1.85736       2.72266    2.13018
40          │   1.74715    1.62878       2.69412    1.99026
41          │   1.32973    1.35369       2.62046    1.96563
42          │   1.57489    1.47441       2.72617     2.0551
43          │   1.47256    1.42538       2.58854    1.91578
⋮                     ⋮          ⋮  ⋱          ⋮          ⋮
179         │  0.362501   0.419556       2.72026    2.06222
180         │   1.29175    1.32064        2.6316    2.15441
181         │  0.228632   0.158411        1.4491   0.456464
182         │   1.29855    1.40964       2.89329     2.2956
183         │   0.94338   0.973938       2.44262     1.7813
184         │  0.453351   0.315396       2.24717     1.4562
185         │  0.228632   0.158411       2.26862    1.55818
186         │   1.32844    1.31991       1.40053      1.488
187         │  0.636514   0.665504  …      1.488   0.392674

In the above examples, we indicate the type of each occurrence in the counting and the probability table to use. Also, it's possible for some measures as entropy and mutual information, to estimate the values only with the count table (without calculate the probability table). Estimating measures only with a ResidueCount table, when this is possible, should be faster than using a probability table.

Time_Pab = map(1:100) do x
    time = @elapsed mapcolpairfreq!(entropy, msa, Probabilities(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))
end

Time_Nab = map(1:100) do x
    time = @elapsed mapcolpairfreq!(entropy, msa, Counts(ContingencyTable(Float64, Val{2}, UngappedAlphabet())))
end

using Plots
gr()

histogram( [Time_Pab Time_Nab],
    labels = ["Using ResidueProbability" "Using ResidueCount"],
    xlabel = "Execution time [seconds]" )

/juliateam/.julia/packages/GR/yMV3y/src/../deps/gr/bin/gksqt: error while loading shared libraries: libQt5Widgets.so.5: cannot open shared object file: No such file or directory
connect: Connection refused
GKS: can't connect to GKS socket application

GKS: Open failed in routine OPEN_WS
GKS: GKS not in proper state. GKS must be either in the state WSOP or WSAC in routine ACTIVATE_WS

Corrected Mutual Information

MIToS ships with two methods to easily calculate corrected mutual information. The first is the algorithm described in Buslje et. al. 2009. This algorithm can be accessed through the buslje09 function and includes:

Low count correction using AdditiveSmoothing
Sequence weighting after a hobohmI clustering
Average Product Correction (APC) proposed by

Dunn et. al. 2008, through the APC! function that takes a MI matrix.

Z score correction using the functions shuffle! from the MSA module and zscore

from the PairwiseListMatrices package.

MIToS.Information.buslje09 — Function

buslje09 takes a MSA or a file and a FileFormat as first arguments. It calculates a Z score and a corrected MI/MIp as described on Busjle et. al. 2009.

keyword argument, type, default value and descriptions:

  - lambda      Float64   0.05    Low count value
  - clustering  Bool      true    Sequence clustering (Hobohm I)
  - threshold             62      Percent identity threshold for clustering
  - maxgap      Float64   0.5     Maximum fraction of gaps in positions included in calculation
  - apc         Bool      true    Use APC correction (MIp)
  - samples     Int       100     Number of samples for Z-score
  - fixedgaps   Bool      true    Fix gaps positions for the random samples
  - alphabet    ResidueAlphabet UngappedAlphabet()  Residue alphabet to be used

This function returns:

  - Z score
  - MI or MIp

The second, implemented in the BLMI function, has the same corrections that the above algorithm, but use BLOSUM62 pseudo frequencies. This function is slower than buslje09 (at the same number of samples), but gives better performance (for structural contact prediction) when the MSA has less than 400 clusters after a Hobohm I at 62% identity.

MIToS.Information.BLMI — Function

BLMI takes a MSA or a file and a FileFormat as first arguments. It calculates a Z score (ZBLMI) and a corrected MI/MIp as described on Busjle et. al. 2009 but using using BLOSUM62 pseudo frequencies instead of a fixed pseudocount.

Keyword argument, type, default value and descriptions:

  - beta        Float64   8.512   β for BLOSUM62 pseudo frequencies
  - lambda      Float64   0.0     Low count value
  - threshold             62      Percent identity threshold for sequence clustering (Hobohm I)
  - maxgap      Float64   0.5     Maximum fraction of gaps in positions included in calculation
  - apc         Bool      true    Use APC correction (MIp)
  - samples     Int       50      Number of samples for Z-score
  - fixedgaps   Bool      true    Fix gaps positions for the random samples

This function returns:

  - Z score (ZBLMI)
  - MI or MIp using BLOSUM62 pseudo frequencies (BLMI/BLMIp)

Example: Estimating corrected MI from an MSA

using MIToS.MSA
using MIToS.Information

msa = read("http://pfam.xfam.org/family/PF16078/alignment/full", Stockholm)
ZMIp, MIp  = buslje09(msa)
ZMIp

39×39 Named PairwiseListMatrices.PairwiseListMatrix{Float64,false,Array{Float64,1}}
Col1 ╲ Col2 │         15          16  …          59          60
────────────┼──────────────────────────────────────────────────
15          │        NaN    0.887066  …    -3.09017     1.23131
16          │   0.887066         NaN       0.219953     3.69502
17          │    3.04321    0.968657       0.758529  -0.0297807
18          │   0.729961    -1.40712        -1.8208   -0.411115
19          │   -3.27504    -1.38457        6.81266     -2.6948
20          │   -4.35143      1.4964      -0.628266    -1.04789
21          │    1.07264    -2.35739         4.4703    -1.12768
27          │   -4.10398   -0.217802      0.0195528    -3.67282
28          │   -4.17516      -4.185       -1.55001    -4.21207
⋮                      ⋮           ⋮  ⋱           ⋮           ⋮
52          │   -2.10793    0.111325        3.36268    -1.21789
53          │   -1.54693     6.57086        11.3775  -0.0725365
54          │    3.71587    -2.26546       -2.61981    0.559415
55          │   -1.25747   0.0399754        1.50746   0.0374642
56          │   -3.44808   -0.401811        8.43645   -0.510143
57          │   0.595956    -3.87251        6.31281    -2.25362
58          │   -2.64298     6.49657         7.5849     4.21591
59          │   -3.09017    0.219953            NaN     2.76327
60          │    1.23131     3.69502  …     2.76327         NaN

ZBLMIp, BLMIp  = BLMI(msa)
ZBLMIp

39×39 Named PairwiseListMatrices.PairwiseListMatrix{Float64,false,Array{Float64,1}}
Col1 ╲ Col2 │           15            16  …            59            60
────────────┼──────────────────────────────────────────────────────────
15          │          NaN    -0.0450545  …     -0.036915   -0.00769414
16          │   -0.0450545           NaN       -0.0102239     0.0372196
17          │   -0.0210853    -0.0369195       0.00873277    -0.0252115
18          │    0.0122675    -0.0412662       -0.0294778     0.0103914
19          │   -0.0467111    -0.0209868         0.104964    -0.0329703
20          │   -0.0384957    0.00661179       -0.0213431    0.00971171
21          │    0.0085977    -0.0252071        0.0763177    -0.0075686
27          │    -0.031003    -0.0085647      -0.00960699    -0.0530898
28          │   -0.0175925    -0.0627666       -0.0407835    -0.0458722
⋮                        ⋮             ⋮  ⋱             ⋮             ⋮
52          │   0.00987037     0.0103642         0.027755   -0.00159873
53          │  -0.00833956      0.105136         0.137323    0.00474555
54          │    0.0151337   -0.00672509       0.00146536   -0.00599631
55          │     0.006021      0.010201        0.0127117     0.0141629
56          │    -0.019085   -0.00144585        0.0832875   -0.00779968
57          │    0.0121505    -0.0591763        0.0604599    -0.0388605
58          │   -0.0186695     0.0988717        0.0605095     0.0656851
59          │    -0.036915    -0.0102239              NaN    0.00427532
60          │  -0.00769414     0.0372196  …    0.00427532           NaN

Visualize Mutual Information

You can use the function of the Plots package to visualize the Mutual Information (MI) network between residues. As an example, we are going to visualize the MI between residues of the Pfam domain PF16078. The heatmap is the simplest way to visualize the values of the Mutual Information matrix.

using Plots
gr()

heatmap(ZMIp, yflip=true)

/juliateam/.julia/packages/GR/yMV3y/src/../deps/gr/bin/gksqt: error while loading shared libraries: libQt5Widgets.so.5: cannot open shared object file: No such file or directory
connect: Connection refused
GKS: can't connect to GKS socket application

GKS: Open failed in routine OPEN_WS
GKS: GKS not in proper state. GKS must be either in the state WSOP or WSAC in routine ACTIVATE_WS

ZMIp is a Z score of the corrected MIp against its distribution on a random MSA (shuffling the residues in each sequence), so pairs with highest values are more likely to co-evolve. Here, we are going to use the top 1% pairs of MSA columns.

using PairwiseListMatrices # to use getlist
using Statistics # to use quantile

threshold = quantile(getlist(ZMIp), 0.99)

10.309587901580539

ZMIp[ ZMIp .< threshold ] .= NaN
heatmap(ZMIp, yflip=true)

/juliateam/.julia/packages/GR/yMV3y/src/../deps/gr/bin/gksqt: error while loading shared libraries: libQt5Widgets.so.5: cannot open shared object file: No such file or directory
connect: Connection refused
GKS: can't connect to GKS socket application

GKS: Open failed in routine OPEN_WS
GKS: GKS not in proper state. GKS must be either in the state WSOP or WSAC in routine ACTIVATE_WS

We are going to calculate the cMI (cumulative mutual information) value of each node. Where cMI is a mutual information score per position that characterizes the extent of mutual information "interactions" in its neighbourhood. This score is calculated as the sum of MI values above a certain threshold for every amino acid pair where the particular residue appears. This value defines to what degree a given amino acid takes part in a mutual information network and we are going to indicate it using the node color. To calculate cMI we are going to use the cumulative function:

cMI = cumulative(ZMIp, threshold)

1×39 Named Array{Float64,2}
Function ╲ Col2 │      15       16       17  …       58       59       60
────────────────┼────────────────────────────────────────────────────────
cumulative      │     0.0      0.0      0.0  …    12.18  11.3775      0.0