UMAP.UMAP_Method
UMAP_(X::AbstractMatrix[, n_components=2]; <kwargs>) -> UMAP_ object

Create a model representing the embedding of data X into n_components-dimensional space. The returned model has the following fields:

  • graph: the graph representing the fuzzy simplicial set of the manifold of X.
  • embedding: the n-component-dimensional embedding of the data X.
  • data: a reference to the input data X.
  • knns: a matrix of indices of X representing each point's nearest neighbors according to metric. knns[j, i] is the index of point i's jth nearest neighbor.
  • dists: the respective distances of the above neighbors. dists[j, i] is the distance of point i's jth nearest neighbor.

Keyword Arguments

  • n_neighbors::Integer = 15: the number of neighbors to consider as locally connected. Larger values capture more global structure in the data, while small values capture more local structure.
  • metric::{SemiMetric, Symbol} = Euclidean(): the metric to calculate distance in the input space. It is also possible to pass metric = :precomputed to treat X like a precomputed distance matrix.
  • n_epochs::Integer = 300: the number of training epochs for embedding optimization
  • learning_rate::Real = 1: the initial learning rate during optimization
  • init::Symbol = :spectral: how to initialize the output embedding; valid options are :spectral and :random
  • min_dist::Real = 0.1: the minimum spacing of points in the output embedding
  • spread::Real = 1: the effective scale of embedded points. Determines how clustered embedded points are in combination with min_dist.
  • set_operation_ratio::Real = 1: interpolates between fuzzy set union and fuzzy set intersection when constructing the UMAP graph (global fuzzy simplicial set). The value of this parameter should be between 1.0 and 0.0: 1.0 indicates pure fuzzy union, while 0.0 indicates pure fuzzy intersection.
  • local_connectivity::Integer = 1: the number of nearest neighbors that should be assumed to be locally connected. The higher this value, the more connected the manifold becomes. This should not be set higher than the intrinsic dimension of the manifold.
  • repulsion_strength::Real = 1: the weighting of negative samples during the optimization process.
  • neg_sample_rate::Integer = 5: the number of negative samples to select for each positive sample. Higher values will increase computational cost but result in slightly more accuracy.
  • a = nothing: this controls the embedding. By default, this is determined automatically by min_dist and spread.
  • b = nothing: this controls the embedding. By default, this is determined automatically by min_dist and spread.
UMAP.compute_membership_strengthsMethod
compute_membership_strengths(knns, dists, σs, ρs) -> rows, cols, vals

Compute the membership strengths for the 1-skeleton of each fuzzy simplicial set.

UMAP.fit_abMethod
fit_ab(min_dist, spread, _a, _b) -> a, b

Find a smooth approximation to the membership function of points embedded in ℜᵈ. This fits a smooth curve that approximates an exponential decay offset by min_dist, returning the parameters (a, b).

UMAP.fuzzy_simplicial_setFunction
fuzzy_simplicial_set(knns, dists, n_neighbors, n_points, local_connectivity, set_op_ratio, apply_fuzzy_combine=true) -> membership_graph::SparseMatrixCSC,

Construct the local fuzzy simplicial sets of each point represented by its distances to its n_neighbors nearest neighbors, stored in knns and dists, normalizing the distances on the manifolds, and converting the metric space to a simplicial set. n_points indicates the total number of points of the original data, while knns contains indices of some subset of those points (ie some subset of 1:n_points). If knns represents neighbors of the elements of some set with itself, then knns should have n_points number of columns. Otherwise, these two values may be inequivalent. If apply_fuzzy_combine is true, use intersections and unions to combine fuzzy sets of neighbors (default true).

The returned graph will have size (n_points, size(knns, 2)).

UMAP.initialize_embeddingMethod
initialize_embedding(graph::AbstractMatrix{<:Real}, ref_embedding::AbstractMatrix{T<:AbstractFloat}) -> embedding

Initialize an embedding of points corresponding to the columns of the graph, by taking weighted means of the columns of ref_embedding, where weights are values from the rows of the graph.

The resulting embedding will have shape (size(ref_embedding, 1), size(graph, 2)), where size(ref_embedding, 1) is the number of components (dimensions) of the reference embedding, and size(graph, 2) is the number of samples in the resulting embedding. Its elements will have type T.

UMAP.knn_searchMethod
knn_search(X, Q, k, metric, knns, dists) -> knns, dists

Given a matrix X and a matrix Q, use the given metric to compute the k nearest neighbors out of the columns of X from the queries (columns in Q). If the matrices are large, reconstruct the approximate nearest neighbors graph of X using the given knns and dists, representing indices and distances of pairwise neighbors of X, and use this to search for approximate nearest neighbors of Q. If the matrices are small, search for exact nearest neighbors of Q by computing all pairwise distances with X.

metric may be of type:

  • ::Symbol - knn_search is dispatched to one of the following based on the evaluation of metric:
  • ::Val(:precomputed) - computes neighbors from X treated as a precomputed distance matrix.
  • ::SemiMetric - computes neighbors from X treated as samples, using the given metric.

Returns

  • knns: knns[j, i] is the index of node i's jth nearest neighbor.
  • dists: dists[j, i] is the distance of node i's jth nearest neighbor.
UMAP.knn_searchMethod
knn_search(X, k, metric) -> knns, dists

Find the k nearest neighbors of each point.

metric may be of type:

  • ::Symbol - knn_search is dispatched to one of the following based on the evaluation of metric:
  • ::Val(:precomputed) - computes neighbors from X treated as a precomputed distance matrix.
  • ::SemiMetric - computes neighbors from X treated as samples, using the given metric.

Returns

  • knns: knns[j, i] is the index of node i's jth nearest neighbor.
  • dists: dists[j, i] is the distance of node i's jth nearest neighbor.
UMAP.optimize_embeddingFunction
optimize_embedding(graph, query_embedding, ref_embedding, n_epochs, initial_alpha, min_dist, spread, gamma, neg_sample_rate, _a=nothing, _b=nothing; move_ref=false) -> embedding

Optimize an embedding by minimizing the fuzzy set cross entropy between the high and low dimensional simplicial sets using stochastic gradient descent. Optimize "query" samples with respect to "reference" samples.

Arguments

  • graph: a sparse matrix of shape (nsamples, nsamples)
  • query_embedding: a vector of length (n_samples,) of vectors representing the embedded data points to be optimized ("query" samples)
  • ref_embedding: a vector of length (n_samples,) of vectors representing the embedded data points to optimize against ("reference" samples)
  • n_epochs: the number of training epochs for optimization
  • initial_alpha: the initial learning rate
  • gamma: the repulsive strength of negative samples
  • neg_sample_rate: the number of negative samples per positive sample
  • _a: this controls the embedding. If the actual argument is nothing, this is determined automatically by min_dist and spread.
  • _b: this controls the embedding. If the actual argument is nothing, this is determined automatically by min_dist and spread.

Keyword Arguments

  • move_ref::Bool = false: if true, also improve the embeddings in ref_embedding, else fix them and only improve embeddings in query_embedding.
UMAP.smooth_knn_distsMethod
smooth_knn_dists(dists, k, local_connectivity; <kwargs>) -> knn_dists, nn_dists

Compute the distances to the nearest neighbors for a continuous value k. Returns the approximated distances to the kth nearest neighbor (knn_dists) and the nearest neighbor (nn_dists) from each point.

UMAP.spectral_layoutMethod
spectral_layout(graph, embed_dim) -> embedding

Initialize the graph layout with spectral embedding.

UMAP.transformMethod
transform(model::UMAP_, Q::AbstractMatrix; <kwargs>) -> embedding

Use the given model to embed new points into an existing embedding. Q is a matrix of some number of points (columns) in the same space as model.data. The returned embedding is the embedding of these points in n-dimensional space, where n is the dimensionality of model.embedding. This embedding is created by finding neighbors of Q in model.embedding and optimizing cross entropy according to membership strengths according to these neighbors.

Keyword Arguments

  • n_neighbors::Integer = 15: the number of neighbors to consider as locally connected. Larger values capture more global structure in the data, while small values capture more local structure.
  • metric::{SemiMetric, Symbol} = Euclidean(): the metric to calculate distance in the input space. It is also possible to pass metric = :precomputed to treat X like a precomputed distance matrix.
  • n_epochs::Integer = 300: the number of training epochs for embedding optimization
  • learning_rate::Real = 1: the initial learning rate during optimization
  • init::Symbol = :spectral: how to initialize the output embedding; valid options are :spectral and :random
  • min_dist::Real = 0.1: the minimum spacing of points in the output embedding
  • spread::Real = 1: the effective scale of embedded points. Determines how clustered embedded points are in combination with min_dist.
  • set_operation_ratio::Real = 1: interpolates between fuzzy set union and fuzzy set intersection when constructing the UMAP graph (global fuzzy simplicial set). The value of this parameter should be between 1.0 and 0.0: 1.0 indicates pure fuzzy union, while 0.0 indicates pure fuzzy intersection.
  • local_connectivity::Integer = 1: the number of nearest neighbors that should be assumed to be locally connected. The higher this value, the more connected the manifold becomes. This should not be set higher than the intrinsic dimension of the manifold.
  • repulsion_strength::Real = 1: the weighting of negative samples during the optimization process.
  • neg_sample_rate::Integer = 5: the number of negative samples to select for each positive sample. Higher values will increase computational cost but result in slightly more accuracy.
  • a = nothing: this controls the embedding. By default, this is determined automatically by min_dist and spread.
  • b = nothing: this controls the embedding. By default, this is determined automatically by min_dist and spread.
UMAP.umapMethod
umap(X::AbstractMatrix[, n_components=2]; <kwargs>) -> embedding

Embed the data X into a n_components-dimensional space. n_neighbors controls how many neighbors to consider as locally connected.

See UMAP_ for a description of keyword arguments.