# Analysis API Reference

## Functions

`ChemometricsTools.COW`

— Method.```
COW( A, B; segments = 20, slack = 1,
maxslack = Int( floor( length(A) / segments ) ) - 2 )
```

COW makes a CorrelationOptimizedWarping object which warp corrects spectra `A`

to reference `B`

. The user can select the number of segments, slack size, and optionally the maximum slack parameter.

Note: Not fully tested.

"Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping" Nielsen, N. P. V.; Carstensen, J. M.; Smedsgaard, J.Journal of Chromatography A. 1998,805, 17–35.

`ChemometricsTools.COW`

— Method.`(moo::COW)(A)`

Applies a learned warping from a COW object to spectrum A.

`CanonicalCorrelationAnalysis(A, B)`

Returns a CanonicalCorrelationAnalysis object which contains (U, V, r) from Arrays A and B. Currently Untested for correctness but should compute....

`ChemometricsTools.Hotelling`

— Method.`Hotelling(X, pca::PCA; Quantile = 0.05, Variance = 1.0)`

Computes the hotelling Tsq and upper control limit cut off of a `pca`

object using a specified `Quantile`

and cumulative variance explained `Variance`

for new or old data `X`

. Stores this to a struct which can be used for new data.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

`ChemometricsTools.Hotelling`

— Method.`Hotelling(X, PLS::PartialLeastSquares; Quantile = 0.05, LVs = 1)`

Computes the hotelling Tsq and upper control limit cut off of a `PartialLeastSquares`

object using a specified `Quantile`

and cumulative variance explained `Variance`

for new or old data `X`

. Stores this to a struct which can be used for new data.

Note: The number of latent variables cannot be automatically set by the explained variance in X. It is not computed cumulatively.

Informative PLS score-loading plots for processunderstanding and monitoring. Rolf Ergon. Journal of Process Control 14 (2004) 889-897 https://pdfs.semanticscholar.org/89b6/677a592dbe05a9b754d377ade416d7a17393.pdf

`ChemometricsTools.Hotelling`

— Method.`(H::Hotelling)(X)`

Retrieves the `T^2`

statistic from a saved Hotelling model. Note 1: This does not automatically center or scale `X`

. Note 2: if the model used to generate the Hotelling struct changes, so will the Hotelling struct(pass by reference).

`ChemometricsTools.LDA`

— Method.`LDA(X, Y; Factors = 1)`

Compute's a LinearDiscriminantAnalysis transform from `x`

with a user specified number of latent variables(`Factors`

). Returns an LDA object.

`ChemometricsTools.LDA`

— Method.`( model::LDA )( Z; Factors = length(model.Values) )`

Calling a LDA object on new data brings the new data `Z`

into the LDA basis.

`ChemometricsTools.PCA`

— Method.`PCA(X; Factors = minimum(size(X)) - 1)`

Compute's a PCA from `x`

using LinearAlgebra's SVD algorithm with a user specified number of latent variables(`Factors`

). Returns a PCA object.

`ChemometricsTools.PCA`

— Method.`(T::PCA)(Z::Array; Factors = length(T.Values), inverse = false)`

Calling a PCA object on new data brings the new data `Z`

into or out of (`inverse`

= true) the PCA basis.

`ChemometricsTools.Q`

— Method.`Q(X, pca::PCA; Quantile = 0.95, Variance = 1.0)`

Computes the Q-statistic and upper control limit cut off of a `pca`

object using a specified `Quantile`

and cumulative variance explained `Variance`

for new or old data `X`

.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

`ChemometricsTools.Q`

— Method.`Q(X, pls::PartialLeastSquares; Quantile = 0.95, Variance = 1.0)`

Computes the Q-statistic and upper control limit cut off of a `pca`

object using a specified `Quantile`

and cumulative variance explained `Variance`

for new or old data `X`

.

Note: The number of latent variables cannot be automatically set by the explained variance in X. It is not computed cumulatively.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

`ChemometricsTools.AssessHealth`

— Method.`AssessHealth( X )`

Returns a somewhat detailed Dict containing information about the 'health' of a dataset. What is included is the following: - PercentMissing: percent of missing entries (includes nothing, inf / nan) in the dataset - EmptyColumns: the columns which have only 1 value - RankEstimate: An estimate of the rank of X - (optional)Duplicates: returns the rows of duplicate observations

`ChemometricsTools.DynamicTimeWarping`

— Method.`DynamicTimeWarping( A, B )`

Applies Dynamic Time Warping of spectrum `A`

so it maps to reference `B`

.

Returns the DTW distance, shortest path, and the DTW cost matrix. This does not automatically rescale `A`

to `B`

, the way inwhich you want to do that is up to the user for now.

Sakoe, Hiroaki; Chiba, Seibi (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.

`ChemometricsTools.ExplainedVariance`

— Method.`ExplainedVarianceX(X,Y, pls::PartialLeastSquares)`

Calculates the explained variance in `X`

& `Y`

of each latent variable in a PartialLeastSquares object.

`ChemometricsTools.ExplainedVariance`

— Method.`ExplainedVariance(lda::LDA)`

Calculates the explained variance of each singular value in an LDA object.

`ChemometricsTools.ExplainedVariance`

— Method.`ExplainedVariance(PCA::PCA)`

Calculates the explained variance of each singular value in a pca object.

`ChemometricsTools.ExplainedVarianceX`

— Method.`ExplainedVarianceX(X,Y, pls::PartialLeastSquares)`

Calculates the explained variance in `X`

of each latent variable in a PartialLeastSquares object.

`ChemometricsTools.ExplainedVarianceY`

— Method.`ExplainedVarianceY(Y, pls::PartialLeastSquares)`

Calculates the explained variance in `Y`

of each latent variable in a PartialLeastSquares object.

`ChemometricsTools.HLDA`

— Method.`HLDA(X, YHOT; K = 1, Factors = 1)`

Compute's a Hierarchical LinearDiscriminantAnalysis transform from `x`

with a user specified number of latent variables(`Factors`

). The adjacency matrices are created from `K`

nearest neighbors.

Returns an LDA object. *Note: this can be used with any other LDA functions such as Gaussian discriminants or explained variance*.

Lu D, Ding C, Xu J, Wang S. Hierarchical Discriminant Analysis. Sensors (Basel). 2018 Jan 18;18(1). pii: E279. doi: 10.3390/s18010279.

`ChemometricsTools.Leverage`

— Method.`Leverage(X::Array)`

Calculates the leverage of samples in a `X`

from the perspective of a linearly addative model.

`ChemometricsTools.Leverage`

— Method.`Leverage(pca::PCA)`

Calculates the leverage of samples in a `pca`

object.

`ChemometricsTools.Leverage`

— Method.`Leverage(pls::PartialLeastSquares)`

Calculates the leverage of samples in a `pls`

object.

`ChemometricsTools.PCA_NIPALS`

— Method.`PCA_NIPALS(X; Factors = minimum(size(X)) - 1, tolerance = 1e-7, maxiters = 200)`

Compute's a PCA from `x`

using the NIPALS algorithm with a user specified number of latent variables(`Factors`

). The tolerance is the minimum change in the F norm before ceasing execution. Returns a PCA object.

`ChemometricsTools.RAFFT`

— Method.`RAFFT(raw, reference; maxlags::Int = 500, lookahead::Int = 1, minlength::Int = 20, mincorr::Float64 = 0.05)`

RAFFT corrects shifts in the `raw`

spectral bands to be similar to those in a given `reference`

spectra through the use of "recursive alignment by FFT". It returns an array of corrected spectra/chromatograms. The number of maximum lags can be specified, the `lookahead`

parameter ensures that additional recursive executions are performed so the first solution found is not preemptively accepted, the minimum segment length(`minlength`

) can also be specified if FWHM are estimable, and the minimum cross correlation(`mincorr`

) for a match can dictate whether peaks were found to align or not.

*Note* This method works best with flat baselines because it repeats last known values when padding aligned spectra. It is highly efficient, and in my tests does a good job, but other methods definitely exist. Let me know if other peak Alignment methods are important for your work-flow, I'll see if I can implement them.

Application of Fast Fourier Transform Cross-Correlation for the Alignment of Large Chromatographic and Spectral Datasets Jason W. H. Wong, Caterina Durante, and, Hugh M. Cartwright. Analytical Chemistry 2005 77 (17), 5655-5661

`ChemometricsTools.findpeaks`

— Method.`findpeaks( vY; m = 3)`

Finds the indices of peaks in a vector vY with a window span of `2m`

. Original R function by Stas_G:(https://stats.stackexchange.com/questions/22974/how-to-find-local-peaks-valleys-in-a-series-of-data) This version is based on a C++ variant by me.