compute_weights(Z::Matrix{Ti}, theta::Real) where Ti <: Integer

Compute the normalized counts of the number of sequences at hamming distance ≤ theta from any given sequence in Z.

compute_weights(Z::Matrix{Ti}, theta::Symbol) where Ti<:Integer

Compute the normalized counts of the number of sequences at hamming distance ≤ of a precomputed optimal threshold. See Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners [] and in particular the Supplementary Information at section 2 Reweighting Scheme for details.

covariance_matrix(Z, W; pc::Real=0)

Compute the covariance matrix from numerical alignment Z and weights W. The output is a N(q-1) × N(q-1) (skipping color q) symmetric matrix.

Keywords arguments:

  • pc in [0,1]: pseudocount [default = 0]


read_fasta_alignment(filename::AbstractString, max_gap_fraction::Real)

Return a L × M matrix of integers (L is the sequence length, and M is the number of sequences) of the multiple sequence alignment contained in the fasta file filename including all sequences with a fraction of gaps (-) ≤ max_gap_fraction.

remove_duplicate_sequences(Z::Matrix{Ti}) where Ti<:Integer

Remove duplicate sequences (columns) in the alignment matrix Z


julia> Z = [1 2 3 1;
            1 3 2 1;]
2×4 Array{Int64,2}:
 1  2  3  1
 1  3  2  1

julia> remove_duplicate_sequences(Z)
removing duplicate sequences... done: 4 -> 3
([1 2 3; 1 3 2], [1, 2, 3])