# CorrDCA

`CorrDCA.compute_weights`

`CorrDCA.compute_weights`

`CorrDCA.covariance_matrix`

`CorrDCA.read_fasta_alignment`

`CorrDCA.remove_duplicate_sequences`

`CorrDCA.compute_weights`

— Method`compute_weights(Z::Matrix{Ti}, theta::Real) where Ti <: Integer`

Compute the normalized counts of the number of sequences at hamming distance ≤ `theta`

from any given sequence in `Z`

.

`CorrDCA.compute_weights`

— Method`compute_weights(Z::Matrix{Ti}, theta::Symbol) where Ti<:Integer`

Compute the normalized counts of the number of sequences at hamming distance ≤ of a precomputed optimal threshold. See `Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners`

[https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0092721] and in particular the Supplementary Information at section `2 Reweighting Scheme`

for details.

`CorrDCA.covariance_matrix`

— Method`covariance_matrix(Z, W; pc::Real=0)`

Compute the covariance matrix from numerical alignment `Z`

and weights `W`

. The output is a `N(q-1) × N(q-1)`

(skipping color `q`

) symmetric matrix.

**Keywords arguments:**

`pc`

in [0,1]: pseudocount [default =`0`

]

end

`CorrDCA.read_fasta_alignment`

— Method`read_fasta_alignment(filename::AbstractString, max_gap_fraction::Real)`

Return a `L × M`

matrix of integers (`L`

is the sequence length, and `M`

is the number of sequences) of the multiple sequence alignment contained in the fasta file `filename`

including all sequences with a fraction of gaps (`-`

) ≤ `max_gap_fraction`

.

`CorrDCA.remove_duplicate_sequences`

— Method`remove_duplicate_sequences(Z::Matrix{Ti}) where Ti<:Integer`

Remove duplicate sequences (columns) in the alignment matrix `Z`

**Examples**

```
julia> Z = [1 2 3 1;
1 3 2 1;]
2×4 Array{Int64,2}:
1 2 3 1
1 3 2 1
julia> remove_duplicate_sequences(Z)
removing duplicate sequences... done: 4 -> 3
([1 2 3; 1 3 2], [1, 2, 3])
```