GenerativeTopographicMapping

Documentation for GenerativeTopographicMapping.

GenerativeTopographicMapping.GTMType
Module

A model type for constructing a module, based on unknown.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

Module = @load Module pkg=unknown

Do model = Module() to construct an instance with default hyper-parameters.

GenerativeTopographicMapping implements Generative Topographic Mapping, Neural Computation; Bishop, C.; (1998):"GTM: The Generative Topographic Mapping"

Training data

In MLJ or MLJBase, bind an instance model to data with mach = machine(model, X) where

  • X: an AbstractMatrix or Table of input features whose columns are of scitype Continuous.

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • k=16: Number of nodes along once side of GTM latent grid. There are total nodes.
  • m=4: Square root of the number of RBF functions in latent transformation. There are total RBFs.
  • σ=0.3: Standard deviation for RBF functions in latent transformation.
  • α=0.1 Model weight regularization parameter (0.0 for regularization)
  • tol=0.0001 Tolerance used for determining convergence during expectation-maximization fitting.
  • niter=200 Maximum number of iterations to use.
  • nrepeats=4 Number of steps to repeat at/below tol before GTM is considered converged.
  • representation=:means Method to apply to fitted responsability matrix. One of (:means, :modes).

Operations

  • transform(mach, X): returns the coordinates corresponding to mean latent node responsability or mode latent node responsability for each data point. This can be used as a two-dimensional representation of the original dataset X.

Fitted parameters

The fields of fitted_params(mach) are:

  • gtm: The GenerativeTopographicMap object fit by the GTM model. Contains node coordinates, RbF means, RBF variance, weights, etc.

Report

The fields of report(mach) are:

  • classes: the index of the mode node responsability for each datapoint in X interpreted as a class label

Examples

using MLJ
gtm = @load GTM pkg=GenerativeTopographicMapping
model = gtm()
X, y = make_blob(100, 10; centers=5) # synthetic data
mach = machine(model, X) |> fit!
X̃ = transform(mach, X)

rpt = report(mach)
classes = rpt.classes
GenerativeTopographicMapping.GenerativeTopographicMapMethod
GTM(k, m, σ, Dataset; α=0.0, tol=0.0001, verbose=false)

Initialize hyperparameters for a GTM model.

  • k: square root of the number of latent nodes
  • m: square root of the number of RBF centers in latent space
  • σ: standard deviation for latent space RBF functions
  • Dataset: dataset to fit GTM model to. Assumed shape is (n_datapoints, n_features)
  • α: Weight regularization parameter (0.0 means no regularization)
  • tol: absolute tolerance used during fitting.
  • verbose: Set to true for extra print statements.
GenerativeTopographicMapping.PosteriorMethod
Posterior(gtm::GenerativeTopographicMap)

Compute a matrix of contributions to posterior probabilities. This is an intermediate result to facilitate computation of true posterior probabilities given by the responsability matrix R. The returned size is (nnodes, ndatapoints). The exp-normalize trick is used for numerical stability.

GenerativeTopographicMapping.ResponsabilitiesMethod
Responsabilities(gtm::GenerativeTopographicMapping, Dataset)

Compute matrix of responsabilities of each node in X to datapoints in Dataset. Return matrix is of size (n_nodes, n_datapoints).

GenerativeTopographicMapping.getDMatrixMethod
getDMatrix(gtm::GenerativeTopographicMap, Dataset)

Compute pairwise distances between projected gaussian centers Y and data points in Dataset. Resulting size is (n_nodes, n_datapoints).

GenerativeTopographicMapping.getUMatrixMethod
getUMatrix(Dataset)

Perform PCA on the Dataset and return a matrix U containing the first two principal components (first two columns of data covariance matrix) and the variance of the third principal component. Size of returned matrix U is (n_features, 2)

GenerativeTopographicMapping.getΦMatrixMethod
getΦMatrix(X, M, σ²)

Given a matrix of latent node coordinates X, RBF mean coordinates M, and variance σ², return a matrix Φ of dimension (n_nodes, n_rbf_centers+1). The final column is set to 1.0 to include a bias offset in addition to the RBFs.

GenerativeTopographicMapping.initβ⁻¹Method
initβ⁻¹(β⁻¹, Y)

Initialized β⁻¹ using our first guess for β⁻¹ (from 3rd principal component variance) and the mean distance between projected rbf centers in data space.