FaST-LMM full rank

Spectral decomposition of the kernel matrix and data rotaton

We first consider the scenario where $K$ is defined by a square semi-positive definite matrix and $K_{22}$ has been computed by svd_fixed_effects. Following Lippert et al. (2011), we call this the "full rank" scenario, although neither $K$ nor $K_{22}$ actually has to have full rank.

The spectral decomposition of $K_{22}$ can be written as $K_{22}=U \Lambda U^T$, with $U\in\mathbb{R}^{(n-d)\times (n-d)}$ unitary and $\Lambda\in\mathbb{R}^{(n-d)\times (n-d)}$ diagonal with non-negative diagonal elements $\lambda_i=\Lambda_{ii}\geq 0$, which we assume are ordered, $\lambda_1\geq\lambda_2\geq \dots \geq \lambda_{n-d}\geq 0$

The columns of $U$ are the eigenvectors of $K_{22}$, and we denote them as $u_i \in\mathbb{R}^{n-d}$, with $K_{22} u_i=\lambda_i u_i$. The matrix $U$ can be used to rotate the data $y_2$:

\[\tilde{y} = U^T y_2 \in \mathbb{R}^{n-d}\]

with components

\[\tilde{y}_i = \langle u_i,y_2\rangle,\quad i=1,\dots,n-d\]

Likelihood function and variance parameter estimation

These results allow to express the terms in $\mathcal{\ell}_R$

\[\begin{aligned} \frac1{n-d}\log\det(K_{22}+\delta I) &= \frac1{n-d}\sum_{i=1}^{n-d} \log(\lambda_i+\delta)\\ \hat\sigma^2 = \frac1{n-d}\langle y_2,(K_{22}+\delta I)^{-1} y_2\rangle &= \frac1{n-d}\sum_{i=1}^{n-d} \frac1{\lambda_i+\delta} \langle y_2,u_i\rangle^2 = \frac1{n-d}\sum_{i=1}^{n-d} \frac{\tilde{y}_i^2}{\lambda_i+\delta} \end{aligned}\]

Note the crucial difference with the original FaST-LMM method. There the eigenvalue decomposition of the full matrix $K$ is used to facilitate the computation of the negative log-likelihood $\mathcal{L}$ (see Introduction), which still involves the solution of a linear system to compute $\hat\beta$. By working in the restricted space orthogonal to the fixed effects and using the eigenvalue decomposition of the reduced matrix $K_{22}$, we have obtained a restricted negative log-likelihood $\ell_R$ which, given $\lambda$ and $\tilde y$ is trivial to evaluate and differentiate by automatic differentiation.

The values of $\hat\sigma^2$ and $\ell_R$ as a function of the parameter $\delta$ and vectors $\lambda$ and $\tilde y$ are computed by the functions sigma2_reml and neg_log_like

sigma2_reml(δ, λ, yr)

Compute the REML of the variance parameter given the variance ratio δ, the eigenvalues of the kernel matrix, and the rotated response vector.

neg_log_like(δ, λ, yr)

Compute the negative (restircted) log-likelihood of the model, scaled by the number of samples and without constant factors, given the variance ratio δ, the eigenvalues of the kernel matrix, and the rotated response vector.

The optimization of the function $\ell_R$ is done by the function delta_reml using the LBFGS algorithm. Because $\delta$ must be greater than zero, we write $\delta$ as the softplus function of an unconstrained variable $x$, that is $\delta=\log(1+e^x)$, and optimize with respect to $x$

delta_reml(λ, yr)

Compute the REML of the variance ratio δ given the eigenvalues of the kernel matrix and the rotated response vector by solving the non-convex optimization problem formulated in the FaST-LMM paper.


Compute the softplus function, i.e. log(1 + exp(x)), element-wise.

fastlmm_fullrank(y,K; covariates = [], mean = true)

For a linear mixed model / Gaussian process,

\[y \sim N\bigl(X\beta, \sigma^2(K + \delta I) \bigr)\]

where $y$ is the response vector, $X$ is a matrix of covariates, and $K$ is a full-rank kernel matrix, compute the restricted maximum-likelihood estimates (REMLs) of the variance parameter $\sigma^2$ and the variance ratio $\delta$ using FaST-LMM with a full-rank kernel matrix. Compared to the original FaST-LMM algorithm, we first project out the (optional) covariates, incl. an (optional) constant off-set (mean=true), from the response vector and the kernel matrix. This avoids all matrix computations in the variance parameter estimation. Estimates for the fixed effects $\beta$ are not computed.

Fixed effects weights using the rotated basis

The fixed effects weights need to be computed only once, after $\hat\delta$ has been estimated, $\hat\beta(\hat\delta)$ is computed using the formula in Fixed effects weights. Using the eigendecomposition of $K_{22}$ and the rotated data $\tilde y = U^Ty_2$ we obtain

\[\begin{aligned} \hat \beta(\hat\delta) &= W \Gamma^{-1} ( y_1 - K_{12} (K_{22}+\hat\delta I)^{-1} y_2 )\\ &= W \Gamma^{-1} ( y_1 - K_{12} U(\Lambda+\hat\delta I)^{-1} U^Ty_2 )\\ &= W \Gamma^{-1} ( y_1 - K_{12} U (\Lambda+\hat\delta I)^{-1} \tilde y ) \end{aligned}\]

The components of the vector $(\Lambda+\hat\delta I) \tilde y$ are easily computed elementwise as $\tilde y_i/(\lambda_i+\hat\delta)$.

This calculation is implemented in the function beta_mle:

beta_mle(δ, λ, yr, y1, K12, U, γ, Wt)

Compute the MLE of the fixed effects weights given the variance ratio δ, the eigenvalues of the kernel matrix, the rotated response vector, the rotated response vector projected onto the orthogonal complement of the column space of the covariates, the block decomposition of the kernel matrix, the eigenvectors of the kernel matrix, and the eigenvalues of the kernel matrix.