Methods

Notations

In these docs, $X$ denotes an $n\times p$ data matrix describing $n$ observations with $p$ features (variables) and possibly $p > n$. For most estimators, the matrix is assumed to have entries in $\mathbb R$.

Note

While in the docs we assume that the rows of $X$ correspond to observations and the columns to features, in the code you can specify this using the dims keyword with dims=1 being the default whereas dims=2 will take rows as features and columns as observations.

We will write $X_c$ the centered matrix i.e. where each column sums up to zero. We will also write $X_s$ the standardised matrix i.e. where each column not only sums up to zero but is scaled to have sample variance one. Finally, we will write $S$ the standard sample covariance estimator (see below) of size $p\times p$ and $D$ the diagonal matrix matching the diagonal of $S$.

Note that the sample variance and covariance can be corrected or uncorrected (and this can be specified in the code). In order to avoid having to specify this everywhere in the document, it is useful to introduce a last symbol: $\kappa$ which is either set to $n$ (uncorrected case) or $(n-1)$ (corrected case).

With these notations we can write:

\[\begin{aligned} S &= \kappa^{-1}X_c^T X_c\\ D_{ij} &= S_{ij}\mathbf 1_{[i=j]}\\ X_s &= X_c D^{-1/2} \end{aligned}\]

Simple estimator

The standard covariance estimator is easily obtained via the equation above. It can be specified with the constructor SimpleCovariance which can take a named argument corrected (either false (default) or true).

Random.seed!(1)
n, p = 5, 7
X = randn(n, p)
# corrected covariance
S = cov(SimpleCovariance(corrected=true), X)
# we can also manually compute it and compare
Xc = (X .- sum(X, dims=1)/n) # centering
κ = n-1 # correction factor
S ≈ (Xc'*Xc)/κ

true

Linear shrinkage estimators

Linear shrinkage estimators correspond to covariance estimators of the form

\[\hat\Sigma = (1-\lambda)S + \lambda F\]

where $F$ is a target matrix of appropriate dimensions, $\lambda\in[0,1]$ is a shrinkage intensity and $S$ is the sample covariance estimator. There are several standard targets that can be used, a simple example being the identity matrix.

The shrinkage intensity $\lambda$ can be specified manually or computed automatically. Depending on the target, different approaches are implemented to compute a good intensity such as, for example, the Ledoit-Wolf optimal intensity (which is the default intensity if you don't specify it).

You can read more on the targets that can be used and the corresponding automatic intensities here.

Here is an example using the identity matrix as a target and automatic shrinkage intensity (Ledoit-Wolfe):

Random.seed!(1)
n, p = 2, 3
X = randn(n, p)
target = DiagonalUnitVariance()
shrinkage = :lw # Ledoit-Wolf optimal shrinkage
method = LinearShrinkage(target, shrinkage)
cov(method, X)

3×3 LinearAlgebra.Symmetric{Float64, Matrix{Float64}}:
  0.090619   0.491257  -0.135059
  0.491257   2.66317   -0.73217
 -0.135059  -0.73217    0.201291

You can also specify the intensity manually:

shrinkage = 0.8
method2 = LinearShrinkage(target, shrinkage)
cov(method2, X)

3×3 LinearAlgebra.Symmetric{Float64, Matrix{Float64}}:
  0.818124    0.0982515  -0.0270117
  0.0982515   1.33263    -0.146434
 -0.0270117  -0.146434    0.840258

Nonlinear shrinkage estimators

Biweight midcovariance

The biweight midcovariance is a covariance estimator that is resilient to outliers. A full description of the technique is included on BiweightMidcovariance.

Random.seed!(1)
n, p = 10, 3
X = rand(TDist(3), (n, p))  # Moderately heavy-tailed data
cov(BiweightMidcovariance(), X)

3×3 Matrix{Float64}:
 0.604299    0.777577    0.0408822
 0.777577    4.2297     -0.0957008
 0.0408822  -0.0957008   2.34132

The two controllable parameters are the keyword arguments c and modify_sample_size, whose purpose is described in the docstring.

Comparing estimators

You may want to look at our simple comparison of covariance estimators which compares the MSE of the various estimators in a range of situations. Long story short, the LinearShrinkageEstimator with DiagonalUnequalVariance target performs well in the case $n<p$ though most other estimators don't fare too badly in comparison. In the case $n>p$, the nonlinear shrinkage method does very well (though it is more expensive to compute).