# Utils

`AutoEncoderToolkit.jl`

offers a series of utility functions for different tasks.

## Training Utilities

`AutoEncoderToolkit.utils.step_scheduler`

— Function``step_scheduler(epoch, epoch_change, learning_rates)``

Simple function to define different learning rates at specified epochs.

**Arguments**

`epoch::Int`

: Epoch at which to define learning rate.`epoch_change::Vector{<:Int}`

: Number of epochs at which to change learning rate. It must include the initial learning rate!`learning_rates::Vector{<:AbstractFloat}`

: Learning rate value for the epoch range. Must be the same length as`epoch_change`

**Returns**

`η::AbstractFloat`

: Learning rate for the current epoch.

`AutoEncoderToolkit.utils.cycle_anneal`

— Function```
cycle_anneal(
epoch::Int,
n_epoch::Int,
n_cycles::Int;
frac::AbstractFloat=0.5f0,
βmax::Number=1.0f0,
βmin::Number=0.0f0,
T::Type=Float32
)
```

Function that computes the value of the annealing parameter β for a variational autoencoder as a function of the epoch number according to the cyclical annealing strategy.

**Arguments**

`epoch::Int`

: Epoch on which to evaluate the value of the annealing parameter.`n_epoch::Int`

: Number of epochs that will be run to train the VAE.`n_cycles::Int`

: Number of annealing cycles to be fit within the number of epochs.

**Optional Arguments**

`frac::AbstractFloat= 0.5f0`

: Fraction of the cycle in which the annealing parameter β will increase from the minimum to the maximum value.`βmax::Number=1.0f0`

: Maximum value that the annealing parameter can reach.`βmin::Number=0.0f0`

: Minimum value that the annealing parameter can reach.`T::Type=Float32`

: The type of the output. The function will convert the output to this type.

**Returns**

`β::T`

: Value of the annealing parameter.

**Citation**

Fu, H. et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing. Preprint at http://arxiv.org/abs/1903.10145 (2019).

`AutoEncoderToolkit.utils.locality_sampler`

— Function`locality_sampler(data, dist_tree, n_primary, n_secondary, k_neighbors; index=false)`

Algorithm to generate mini-batches based on spatial locality as determined by a pre-constructed nearest neighbors tree.

**Arguments**

`data::AbstractArray`

: An array containing the data points. The data points can be of any dimension.`dist_tree::NearestNeighbors.NNTree`

:`NearestNeighbors.jl`

tree used to determine the distance between data points.`n_primary::Int`

: Number of primary points to sample.`n_secondary::Int`

: Number of secondary points to sample from the neighbors of each primary point.`k_neighbors::Int`

: Number of nearest neighbors from which to potentially sample the secondary points.

**Optional Keyword Arguments**

`index::Bool`

: If`true`

, returns the indices of the selected samples. If`false`

, returns the`data`

corresponding to the indexes. Defaults to`false`

.

**Returns**

- If
`index`

is`true`

, returns`sample_idx::Vector{Int64}`

: Indices of data points to include in the mini-batch. - If
`index`

is`false`

, returns`sample_data::AbstractArray`

: The data points to include in the mini-batch.

**Description**

This sampling algorithm consists of three steps:

- For each datapoint, determine the
`k_neighbors`

nearest neighbors using the`dist_tree`

. - Uniformly sample
`n_primary`

points without replacement from all data points. - For each primary point, sample
`n_secondary`

points without replacement from its`k_neighbors`

nearest neighbors.

**Examples**

```
# Pre-constructed NearestNeighbors.jl tree
dist_tree = NearestNeighbors.KDTree(data, metric)
sample_indices = locality_sampler(data, dist_tree, 10, 5, 50)
```

**Citation**

Skafte, N., Jø rgensen, M. & Hauberg, S. ren. Reliable training and estimation of variance networks. in Advances in Neural Information Processing Systems vol. 32 (Curran Associates, Inc., 2019).

## Centroid Finding Utilities

Some VAE models, such as the `RHVAE`

, require clustering of the data. Specifically `RHVAE`

can take a fixed subset of the training data as a reference for the computation of the metric tensor. The following functions can be used to define this reference subset to be used as centroids for the metric tensor computation.

`AutoEncoderToolkit.utils.centroids_kmeans`

— Function```
centroids_kmeans(
x::AbstractMatrix,
n_centroids::Int;
assign::Bool=false
)
```

Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

**Arguments**

`x::AbstractMatrix`

: The input data. Rows represent individual samples.`n_centroids::Int`

: The number of centroids to compute.

**Optional Keyword Arguments**

`assign::Bool=false`

: If true, also return the assignments of each point to a centroid.

**Returns**

- If
`assign`

is false, returns a matrix where each column is a centroid. - If
`assign`

is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

**Examples**

```
data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
```

```
centroids_kmeans(
x::AbstractArray,
n_centroids::Int;
reshape_centroids::Bool=true,
assign::Bool=false
)
```

Perform k-means clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

The input data is flattened into a matrix before performing k-means clustering. This is done because k-means operates on a set of data points in a vector space and cannot handle multi-dimensional arrays. Flattening the input ensures that the k-means algorithm can process the data correctly.

By default, the output centroids are reshaped back to the original input shape. This is controlled by the `reshape_centroids`

argument.

**Arguments**

`x::AbstractArray`

: The input data. It can be a multi-dimensional array where the last dimension represents individual samples.`n_centroids::Int`

: The number of centroids to compute.

**Optional Keyword Arguments**

`reshape_centroids::Bool=true`

: If true, reshape the output centroids back to the original input shape.`assign::Bool=false`

: If true, also return the assignments of each point to a centroid.

**Returns**

- If
`assign`

is false, returns a matrix where each column is a centroid. - If
`assign`

is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

**Examples**

```
data = rand(100, 10)
centroids = centroids_kmeans(data, 5)
```

`AutoEncoderToolkit.utils.centroids_kmedoids`

— Function```
centroids_kmedoids(
x::AbstractMatrix, n_centroids::Int; assign::Bool=false
)
```

Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

**Arguments**

`x::AbstractMatrix`

: The input data. Rows represent individual samples.`n_centroids::Int`

: The number of centroids to compute.`dist::Distances.PreMetric=Distances.Euclidean()`

: The distance metric to use when computing the pairwise distance matrix.

**Optional Keyword Arguments**

`assign::Bool=false`

: If true, also return the assignments of each point to a centroid.

**Returns**

- If
`assign`

is false, returns a matrix where each column is a centroid. - If
`assign`

is true, returns a tuple where the first element is the matrix of centroids and the second element is a vector of assignments.

**Examples**

```
data = rand(100, 10)
centroids = centroids_kmedoids(data, 5)
```

```
centroids_kmedoids(
x::AbstractArray,
n_centroids::Int,
dist::Distances.PreMetric=Distances.Euclidean();
assign::Bool=false
)
```

Perform k-medoids clustering on the input and return the centers. This function can be used to down-sample the number of points used when computing the metric tensor in training a Riemannian Hamiltonian Variational Autoencoder (RHVAE).

**Arguments**

`x::AbstractArray`

: The input data. The last dimension of`x`

should contain each of the samples that should be clustered.`n_centroids::Int`

: The number of centroids to compute.`dist::Distances.PreMetric=Distances.Euclidean()`

: The distance metric to use for the clustering. Defaults to Euclidean distance.

**Optional Keyword Arguments**

`assign::Bool=false`

: If true, also return the assignments of each point to a centroid.

**Returns**

- If
`assign`

is false, returns an array where each column is a centroid. - If
`assign`

is true, returns a tuple where the first element is the array of centroids and the second element is a vector of assignments.

**Examples**

```
data = rand(10, 100)
centroids = centroids_kmedoids(data, 5)
```

## Other Utilities

`AutoEncoderToolkit.utils.storage_type`

— Function`storage_type(A::AbstractArray)`

Determine the storage type of an array.

This function recursively checks the parent of the array until it finds the base storage type. This is useful for determining whether an array or its subarrays are stored on the CPU or GPU.

**Arguments**

`A::AbstractArray`

: The array whose storage type is to be determined.

**Returns**

The type of the array that is the base storage of `A`

.

`AutoEncoderToolkit.utils.vec_to_ltri`

— Function` vec_to_ltri(diag::AbstractVecOrMat, lower::AbstractVecOrMat)`

Convert two one-dimensional vectors or matrices into a lower triangular matrix or a 3D tensor.

**Arguments**

`diag::AbstractVecOrMat`

: The input vector or matrix to be converted into the diagonal of the matrix. If it's a matrix, each column is considered as a separate vector.`lower::AbstractVecOrMat`

: The input vector or matrix to be converted into the lower triangular part of the matrix. The length of this vector or the number of rows in this matrix should be a triangular number (i.e., the sum of the first`n`

natural numbers for some`n`

). If it's a matrix, each column is considered the lower part of a separate lower triangular matrix.

**Returns**

- A lower triangular matrix or a 3D tensor where each slice is a lower triangular matrix constructed from
`diag`

and`lower`

.

**Description**

This function constructs a lower triangular matrix or a 3D tensor from two input vectors or matrices, `diag`

and `lower`

. The `diag`

vector or matrix provides the diagonal elements of the matrix, while the `lower`

vector or matrix provides the elements below the diagonal. The function uses a comprehension to construct the matrix or tensor, with the `lower_index`

function calculating the appropriate index in the `lower`

vector or matrix for each element below the diagonal.

**GPU Support**

The function supports both CPU and GPU arrays. For GPU arrays, the data is first transferred to the CPU, the lower triangular matrix or tensor is constructed, and then it is transferred back to the GPU.

`AutoEncoderToolkit.utils.vec_mat_vec_batched`

— Function```
vec_mat_vec_batched(
v::AbstractVector,
M::AbstractMatrix,
w::AbstractVector
)
```

Compute the product of a vector, a matrix, and another vector in the form v̲ᵀ M̲̲ w̲.

This function takes two vectors `v`

and `w`

, and a matrix `M`

, and computes the product v̲ M̲̲ w̲. This function is added for consistency when calling multiple dispatch.

**Arguments**

`v::AbstractVector`

: A`d`

dimensional vector.`M::AbstractMatrix`

: A`d×d`

matrix.`w::AbstractVector`

: A`d`

dimensional vector.

**Returns**

A scalar which is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

**Notes**

This function uses the `LinearAlgebra.dot`

function to perform the multiplication of the matrix `M`

with the vector `w`

. The resulting vector is then element-wise multiplied with the vector `v`

and summed over the dimensions to obtain the final result. This function is added for consistency when calling multiple dispatch.

```
vec_mat_vec_batched(
v::AbstractMatrix,
M::AbstractArray,
w::AbstractMatrix
)
```

Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲.

This function takes two matrices `v`

and `w`

, and a 3D array `M`

, and computes the batched product v̲ M̲̲ w̲. The computation is performed in a broadcasted manner using the `Flux.batched_vec`

function.

**Arguments**

`v::AbstractMatrix`

: A`d×n`

matrix, where`d`

is the dimension of the vectors and`n`

is the number of vectors.`M::AbstractArray`

: A`d×d×n`

array, where`d`

is the dimension of the matrices and`n`

is the number of matrices.`w::AbstractMatrix`

: A`d×n`

matrix, where`d`

is the dimension of the vectors and`n`

is the number of vectors.

**Returns**

An `n`

dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

**Notes**

This function uses the `Flux.batched_vec`

function to perform the batched multiplication of the matrices in `M`

with the vectors in `w`

. The resulting vectors are then element-wise multiplied with the vectors in `v`

and summed over the dimensions to obtain the final result.

```
vec_mat_vec_batched(
v::AbstractVector{T},
M::AbstractMatrix{S},
w::AbstractVector{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}
```

Compute the product of a vector and a matrix in the form v̲ᵀ M̲ w̲ for a specific type of matrix and vectors.

This function takes two vectors `v`

and `w`

of type `TaylorDiff.TaylorScalar{Float32,2}`

, and a matrix `M`

of type `Number`

, and computes the product v̲ M̲ w̲. The computation is performed by first performing the matrix-vector multiplication M̲ w̲, and then computing the dot product of the resulting vector with `v`

.

**Arguments**

`v::AbstractVector{T}`

: A`d`

dimensional vector.`T`

is a subtype of`TaylorDiff.TaylorScalar{Float32,2}`

.`M::AbstractMatrix{S}`

: A`d×d`

matrix.`S`

is a subtype of`Number`

.`w::AbstractVector{T}`

: A`d`

dimensional vector.`T`

is a subtype of`TaylorDiff.TaylorScalar{Float32,2}`

.

**Returns**

A scalar which is the result of the product v̲ M̲ w̲.

**Notes**

This function uses the `dot`

function to compute the final dot product.

```
vec_mat_vec_batched(
v::AbstractMatrix{T},
M::AbstractArray{S,3},
w::AbstractMatrix{T}
) where {T<:TaylorDiff.TaylorScalar{Float32,2},S<:Number}
```

Compute the batched product of vectors and matrices in the form v̲ᵀ M̲̲ w̲ for a specific type of matrices and vectors.

This function takes two matrices `v`

and `w`

of type `TaylorDiff.TaylorScalar{Float32,2}`

, and a 3D array `M`

of type `Number`

, and computes the batched product v̲ M̲̲ w̲. The computation is performed by first extracting each slice of `M`

and each column of `w`

, then performing the vector-matrix multiplication for each pair of slices, and finally computing the element-wise multiplication of the resulting matrix with `v`

and summing over the dimensions.

**Arguments**

`v::AbstractMatrix{T}`

: A`d×n`

matrix, where`d`

is the dimension of the vectors and`n`

is the number of vectors.`T`

is a subtype of`TaylorDiff.TaylorScalar{Float32,2}`

.`M::AbstractArray{S,3}`

: A`d×d×n`

array, where`d`

is the dimension of the matrices and`n`

is the number of matrices.`S`

is a subtype of`Number`

.`w::AbstractMatrix{T}`

: A`d×n`

matrix, where`d`

is the dimension of the vectors and`n`

is the number of vectors.`T`

is a subtype of`TaylorDiff.TaylorScalar{Float32,2}`

.

**Returns**

An `n`

dimensional array where each element is the result of the product v̲ M̲̲ w̲ for the corresponding vectors and matrix.

**Notes**

This function uses the `eachslice`

and `eachcol`

functions to extract the slices of `M`

and the columns of `w`

, respectively. It then uses a list comprehension to perform the vector-matrix multiplication for each pair of slices, and finally computes the element-wise multiplication of the resulting matrix with `v`

and sums over the dimensions to obtain the final result.

`AutoEncoderToolkit.utils.slogdet`

— Function`slogdet(A::AbstractArray{T}; check::Bool=false) where {T<:Number}`

Compute the log determinant of a positive-definite matrix `A`

or a 3D array of such matrices.

**Arguments**

`A::AbstractArray{T}`

: A positive-definite matrix or a 3D array of positive-definite matrices whose log determinant is to be computed.`check::Bool=false`

: A flag that determines whether to check if the input matrix`A`

is positive-definite. Defaults to`false`

due to numerical instability.

**Returns**

- The log determinant of
`A`

. If`A`

is a 3D array, returns a 1D array of log determinants, one for each slice along the third dimension of`A`

.

**Description**

This function computes the log determinant of a positive-definite matrix `A`

or a 3D array of such matrices. It first computes the Cholesky decomposition of `A`

, and then calculates the log determinant as twice the sum of the log of the diagonal elements of the lower triangular matrix from the Cholesky decomposition.

**Conditions**

The input matrix `A`

must be a positive-definite matrix, i.e., it must be symmetric and all its eigenvalues must be positive. If `check`

is set to `true`

, the function will throw an error if `A`

is not positive-definite.

**GPU Support**

The function supports both CPU and GPU arrays.

`AutoEncoderToolkit.utils.sample_MvNormalCanon`

— Function`sample_MvNormalCanon(Σ⁻¹::AbstractArray{T}) where {T<:Number}`

Draw a random sample from a multivariate normal distribution in canonical form.

**Arguments**

`Σ⁻¹::AbstractArray{T}`

: The precision matrix (inverse of the covariance matrix) of the multivariate normal distribution. This can be a 2D array (matrix) or a 3D array.

**Returns**

- A random sample drawn from the multivariate normal distribution specified by the input precision matrix. If
`Σ⁻¹`

is a 3D array, returns a 2D array of samples, one for each slice along the third dimension of`Σ⁻¹`

.

**Description**

This function draws a random sample from a multivariate normal distribution specified by a precision matrix `Σ⁻¹`

. The precision matrix can be a 2D array (matrix) or a 3D array. If `Σ⁻¹`

is a 3D array, the function draws a sample for each slice along the third dimension of `Σ⁻¹`

.

The function first inverts the precision matrix to obtain the covariance matrix, then performs a Cholesky decomposition of the covariance matrix. It then draws a sample from a standard normal distribution and multiplies it by the lower triangular matrix from the Cholesky decomposition to obtain the final sample.

**GPU Support**

The function supports both CPU and GPU arrays.

`AutoEncoderToolkit.utils.unit_vector`

— Function`unit_vector(x::AbstractVector, i::Int)`

Create a unit vector of the same length as `x`

with the `i`

-th element set to 1.

**Arguments**

`x::AbstractVector`

: The vector whose length is used to determine the dimension of the unit vector.`i::Int`

: The index of the element to be set to 1.

**Returns**

- A unit vector of type
`eltype(x)`

and length equal to`x`

with the`i`

-th element set to 1.

**Description**

This function creates a unit vector of the same length as `x`

with the `i`

-th element set to 1. All other elements are set to 0.

**Note**

This function is marked with the `@ignore_derivatives`

macro from the `ChainRulesCore`

package, which means that all AutoDiff backends will ignore any call to this function when computing gradients.

`unit_vector(x::AbstractMatrix, i::Int)`

Create a unit vector of the same length as the number of rows in `x`

with the `i`

-th element set to 1.

**Arguments**

`x::AbstractMatrix`

: The matrix whose number of rows is used to determine the dimension of the unit vector.`i::Int`

: The index of the element to be set to 1.

**Returns**

- A unit vector of type
`eltype(x)`

and length equal to the number of rows in`x`

with the`i`

-th element set to 1.

**Description**

This function creates a unit vector of the same length as the number of rows in `x`

with the `i`

-th element set to 1. All other elements are set to 0.

`AutoEncoderToolkit.utils.finite_difference_gradient`

— Function```
finite_difference_gradient(
f::Function,
x::AbstractVecOrMat;
fdtype::Symbol=:central
)
```

Compute the finite difference gradient of a function `f`

at a point `x`

.

**Arguments**

`f::Function`

: The function for which the gradient is to be computed. This function must return a scalar value.`x::AbstractVecOrMat`

: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.

**Optional Keyword Arguments**

`fdtype::Symbol=:central`

: The finite difference type. It can be either`:forward`

or`:central`

. Defaults to`:central`

.

**Returns**

- A vector or a matrix representing the gradient of
`f`

at`x`

, depending on the input type of`x`

.

**Description**

This function computes the finite difference gradient of a function `f`

at a point `x`

. The gradient is a vector or a matrix where the `i`

-th element is the partial derivative of `f`

with respect to the `i`

-th element of `x`

.

The partial derivatives are computed using the forward or central difference formula, depending on the `fdtype`

argument:

- Forward difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x)] / ε
- Central difference formula: ∂f/∂xᵢ ≈ [f(x + ε * eᵢ) - f(x - ε * eᵢ)] / 2ε

where ε is the step size and eᵢ is the `i`

-th unit vector.

**GPU Support**

This function supports both CPU and GPU arrays.

`AutoEncoderToolkit.utils.taylordiff_gradient`

— Function```
taylordiff_gradient(
f::Function,
x::AbstractVecOrMat
)
```

Compute the gradient of a function `f`

at a point `x`

using Taylor series differentiation.

**Arguments**

`f::Function`

: The function for which the gradient is to be computed. This must be a scalar function.`x::AbstractVecOrMat`

: The point at which the gradient is to be computed. Can be a vector or a matrix. If a matrix, each column represents a point where the function f is to be evaluated and the derivative computed.

**Returns**

- A vector or a matrix representing the gradient of
`f`

at`x`

, depending on the input type of`x`

.

**Description**

This function computes the gradient of a function `f`

at a point `x`

using Taylor series differentiation. The gradient is a vector or a matrix where the `i`

-th element or column is the partial derivative of `f`

with respect to the `i`

-th element of `x`

.

The partial derivatives are computed using the TaylorDiff.derivative function.

**GPU Support**

This function currently only supports CPU arrays.