Analysis

Analysis API Reference

Functions

COW( A, B; segments = 20, slack = 1,
           maxslack = Int( floor( length(A) / segments ) ) - 2 )

COW makes a CorrelationOptimizedWarping object which warp corrects spectra A to reference B. The user can select the number of segments, slack size, and optionally the maximum slack parameter.

Note: Not fully tested.

"Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping" Nielsen, N. P. V.; Carstensen, J. M.; Smedsgaard, J.Journal of Chromatography A. 1998,805, 17–35.

(moo::COW)(A)

Applies a learned warping from a COW object to spectrum A.

CanonicalCorrelationAnalysis(A, B)

Returns a CanonicalCorrelationAnalysis object which contains (U, V, r) from Arrays A and B. Currently Untested for correctness but should compute....

Hotelling(X, pca::PCA; Quantile = 0.05, Variance = 1.0)

Computes the hotelling Tsq and upper control limit cut off of a pca object using a specified Quantile and cumulative variance explained Variance for new or old data X. Stores this to a struct which can be used for new data.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

Hotelling(X, PLS::PartialLeastSquares; Quantile = 0.05, LVs = 1)

Computes the hotelling Tsq and upper control limit cut off of a PartialLeastSquares object using a specified Quantile and cumulative variance explained Variance for new or old data X. Stores this to a struct which can be used for new data.

Note: The number of latent variables cannot be automatically set by the explained variance in X. It is not computed cumulatively.

Informative PLS score-loading plots for processunderstanding and monitoring. Rolf Ergon. Journal of Process Control 14 (2004) 889-897 https://pdfs.semanticscholar.org/89b6/677a592dbe05a9b754d377ade416d7a17393.pdf

(H::Hotelling)(X)

Retrieves the T^2 statistic from a saved Hotelling model. Note 1: This does not automatically center or scale X. Note 2: if the model used to generate the Hotelling struct changes, so will the Hotelling struct(pass by reference).

LDA(X, Y; Factors = 1)

Compute's a LinearDiscriminantAnalysis transform from x with a user specified number of latent variables(Factors). Returns an LDA object.

( model::LDA )( Z; Factors = length(model.Values) )

Calling a LDA object on new data brings the new data Z into the LDA basis.

PCA(X; Factors = minimum(size(X)) - 1)

Compute's a PCA from x using LinearAlgebra's SVD algorithm with a user specified number of latent variables(Factors). Returns a PCA object.

(T::PCA)(Z::Array; Factors = length(T.Values), inverse = false)

Calling a PCA object on new data brings the new data Z into or out of (inverse = true) the PCA basis.

Q(X, pca::PCA; Quantile = 0.95, Variance = 1.0)

Computes the Q-statistic and upper control limit cut off of a pca object using a specified Quantile and cumulative variance explained Variance for new or old data X.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

Q(X, pls::PartialLeastSquares; Quantile = 0.95, Variance = 1.0)

Computes the Q-statistic and upper control limit cut off of a pca object using a specified Quantile and cumulative variance explained Variance for new or old data X.

Note: The number of latent variables cannot be automatically set by the explained variance in X. It is not computed cumulatively.

A review of PCA-based statistical process monitoring methodsfor time-dependent, high-dimensional data. Bart De Ketelaere https://wis.kuleuven.be/stat/robust/papers/2013/deketelaere-review.pdf

AssessHealth( X )

Returns a somewhat detailed Dict containing information about the 'health' of a dataset. What is included is the following: - PercentMissing: percent of missing entries (includes nothing, inf / nan) in the dataset - EmptyColumns: the columns which have only 1 value - RankEstimate: An estimate of the rank of X - (optional)Duplicates: returns the rows of duplicate observations

DynamicTimeWarping( A, B )

Applies Dynamic Time Warping of spectrum A so it maps to reference B.

Returns the DTW distance, shortest path, and the DTW cost matrix. This does not automatically rescale A to B, the way inwhich you want to do that is up to the user for now.

Sakoe, Hiroaki; Chiba, Seibi (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055.

ExplainedVarianceX(X,Y, pls::PartialLeastSquares)

Calculates the explained variance in X & Y of each latent variable in a PartialLeastSquares object.

ExplainedVariance(lda::LDA)

Calculates the explained variance of each singular value in an LDA object.

ExplainedVariance(PCA::PCA)

Calculates the explained variance of each singular value in a pca object.

ExplainedVarianceX(X,Y, pls::PartialLeastSquares)

Calculates the explained variance in X of each latent variable in a PartialLeastSquares object.

ExplainedVarianceY(Y, pls::PartialLeastSquares)

Calculates the explained variance in Y of each latent variable in a PartialLeastSquares object.

HLDA(X, YHOT; K = 1, Factors = 1)

Compute's a Hierarchical LinearDiscriminantAnalysis transform from x with a user specified number of latent variables(Factors). The adjacency matrices are created from K nearest neighbors.

Returns an LDA object. Note: this can be used with any other LDA functions such as Gaussian discriminants or explained variance.

Lu D, Ding C, Xu J, Wang S. Hierarchical Discriminant Analysis. Sensors (Basel). 2018 Jan 18;18(1). pii: E279. doi: 10.3390/s18010279.

Leverage(X::Array)

Calculates the leverage of samples in a X from the perspective of a linearly addative model.

Leverage(pca::PCA)

Calculates the leverage of samples in a pca object.

Leverage(pls::PartialLeastSquares)

Calculates the leverage of samples in a pls object.

PCA_NIPALS(X; Factors = minimum(size(X)) - 1, tolerance = 1e-7, maxiters = 200)

Compute's a PCA from x using the NIPALS algorithm with a user specified number of latent variables(Factors). The tolerance is the minimum change in the F norm before ceasing execution. Returns a PCA object.

RAFFT(raw, reference; maxlags::Int = 500, lookahead::Int = 1, minlength::Int = 20, mincorr::Float64 = 0.05)

RAFFT corrects shifts in the raw spectral bands to be similar to those in a given reference spectra through the use of "recursive alignment by FFT". It returns an array of corrected spectra/chromatograms. The number of maximum lags can be specified, the lookahead parameter ensures that additional recursive executions are performed so the first solution found is not preemptively accepted, the minimum segment length(minlength) can also be specified if FWHM are estimable, and the minimum cross correlation(mincorr) for a match can dictate whether peaks were found to align or not.

Note This method works best with flat baselines because it repeats last known values when padding aligned spectra. It is highly efficient, and in my tests does a good job, but other methods definitely exist. Let me know if other peak Alignment methods are important for your work-flow, I'll see if I can implement them.

Application of Fast Fourier Transform Cross-Correlation for the Alignment of Large Chromatographic and Spectral Datasets Jason W. H. Wong, Caterina Durante, and, Hugh M. Cartwright. Analytical Chemistry 2005 77 (17), 5655-5661

findpeaks( vY; m = 3)

Finds the indices of peaks in a vector vY with a window span of 2m. Original R function by Stas_G:(https://stats.stackexchange.com/questions/22974/how-to-find-local-peaks-valleys-in-a-series-of-data) This version is based on a C++ variant by me.