PCA

Principal Component Analysis (PCA) is obtained by eigenvalue-eigenvector decomposition. It was first conceived by Karl Pearson (1901)馃帗 as a way to fit straight lines to a multidimensional cloud of points. It corresponds to the situation $m=1$ (one dataset) and $k=1$ (one observation).

Let $X$ be a $n鈰卼$ data matrix, where $n$ is the number of variables and $t$ the number of samples and let $C$ be its $n鈰卬$ covariance matrix. Being $C$ a positive semi-definite matrix, its eigenvector matrix $U$ diagonalizes $C$ by rotation, as

$U^{H}CU=螞$. $\hspace{1cm}$ [pca.1]

The eigenvalues in the diagonal matrix $螞$ are all non-negative. They are all real and positive if $C$ is positive definite, which is assumed in the remaining of this exposition. The linear transformation $U^{H}X$ yields uncorrelated data with variance of the $n^{th}$ component equal to the corresponding eigenvalue $位_n$, that is,

$\frac{1}{T}U^{H}XX^{H}U=螞$. $\hspace{1cm}$ [pca.2]

In Diagonalizations.jl the diagonal elements of diagonalized matrices are always arranged by descending order, such as

$位_1鈮ldots鈮ノ籣n$. $\hspace{1cm}$ [pca.3]

Then, because of the extremal properties of eigenvalues (Congedo, 2013, p. 66; Schott, 1997, p. 104-128)馃帗, the first component (row) of $U^{H}X$ holds the linear combination of $X$ with maximal variance, the second the linear combination with maximal residual variance and so on, subject to constraint $U^{H}U=UU^{H}=I$.

Let $蟽_{TOT}=\sum_{i=1}^n位_i=tr(C)$ be the total variance and let $\widetilde{U}=[u_1 \ldots u_p]$ be the matrix holding the first $p<n$ eigenvectors, where $p$ is the subspace dimension, then

$蟽_p=\frac{\sum_{i=1}^p位_i}{蟽_{TOT}}=\frac{tr(\widetilde{U}^HC\widetilde{U})}{tr(C)}$$\hspace{1cm}$ [pca.4]

is named the explained variance and

$蔚_p=蟽_{TOT}-蟽_p$$\hspace{1cm}$ [pca.5]

is named the representation error. These quantities are expressed in proportions, that is, it holds $蟽_p+蔚_p=1$.

The accumulated regularized eigenvalues (arev) are defined as

$蟽_j=\sum_{i=1}^j{蟽_i}$, for $j=[1 \ldots n]$, $\hspace{1cm}$ [pca.6]

where $蟽_i$ is given by Eq. [pca.4].

For setting the subspace dimension $p$ manually, set the eVar optional keyword argument of the PCA constructors either to an integer or to a real number, this latter establishing $p$ in conjunction with argument eVarMeth using the arev vector (see subspace dimension). By default, eVar is set to 0.999.

Solution

The PCA solution is given by the eigenvalue-eigenvector decoposition of $C$

$\textrm{EVD}(C)=U螞U^{H}$.

It is worth mentioning that

$\widetilde{U}\widetilde{螞}\widetilde{U}^H$,

where $\widetilde{螞}$ is the leading $p鈰卲$ block of $螞$, is the best approximant to $C$ with rank $p$ in the least-squares sense (Good, 1969)馃帗.

Constructors

Three constructors are available (see here below). The constructed LinearFilter object holding the PCA will have fields:

.F: matrix $\widetilde{U}$ with orthonormal columns holding the first $p$ eigenvectors in $U$, or just $U$ if $p=n$

.iF: the (conjugate) transpose of .F

.D: the leading $p鈰卲$ block of $螞$, i.e., the eigenvalues associated to .F in diagonal form.

.eVar: the explained variance [pca.4] for the chosen value of $p$.

.ev: the vector diag(螞) holding all $n$ eigenvalues.

.arev: the accumulated regularized eigenvalues in [pca.6].

Missing docstring.

Missing docstring for pca. Check Documenter's build log for details.