Clusterpath.AVLTreeType
mutable struct AVLTree{T}()

Creates an empty AVL tree. tree[i] returns ith smallest node.

Supported operations

Base.haskey(tree::AVLTree{K}, d::K)

Returns a bool which indicates presence of a node with key d

Base.insert!(tree::AVLTree{K}, d::K)

Inserts a node with key d of type K (Base.push! is recommanded)

Base.push!(tree::AVLTree{K}, key0)

Inserts a node with key d of arbitrary type that can be converted into type K

Base.delete!(tree::AVLTree{K}, d::K)

Deletes a node with key d of type K

sorted_rank(tree::AVLTree{K}, key::K)

Returns the rank of key present in the tree, if it present. A KeyError is thrown if key is not present.

Base.getindex(tree::AVLTree{K}, ind::Integer)

Returns the indth node

Clusterpath._prepare!Method
_prepare(solution_i; for_path::Bool)

prepare the return of cast_solution() for plot_path(), or plot_cluster().

  • solution_i: single solution path element of cast_solution().
  • for_path: if true then prepare for plot_path(), false then prepare for plot_cluster().
Clusterpath.assign_clusterMethod
assign_cluster(x; α=0.1, verbose=false)

assign cluster to each of the observations in x. returns an array of length=size(x, 1) of cluster indices.

  • x: data
  • α: threshold for BMT-clusterpath
Clusterpath.cast_solutionMethod
cast_solution(x; α=0.1, silence_warning=false)

A generator function that casts a dataframe of solution path using clusterpath(). The function yields the solution path of each observation one at an iteration. This return will be used in plot_path() and plot_cluster().

  • x: the data for which we want to cluster.
  • α: threshold for Big Merge Tracker procedure. α shoulbe be ≥ 0. The smaller the value of α, the more detailed the solution path will be. The larger the α, the faster the algorithm will run. (default: 0.1)
Clusterpath.clusterpathMethod
clusterpath!(y; α=0.1, jitter=1e-6, return_split=false, silence_warning=false)

Perform convex clustering via α-thresholded clusterpath algorithm to y. To avoid ties, add a small perturbation (jitter) to each data, and then sort it. Returns a dictionary of

  • (1) idx: cluster indices assigned to each elements in y
  • (2) lambda: λ values in each merge
  • (3) alpha: solution corresponding to each observation where merge happened
  • (4) n_cluster: number of clusters wrt merging algorithm
  • (5) n_splits: number of total splits, in corresponding top-down algorithm
  • (6) splits: sample split point (only get returned when return_split=true)
Clusterpath.clusterpath_popMethod
clusterpath_pop(a2::T, μ2::T; a1=1-a2, μ1=-μ1, a3=0., μ3=0, σ=1) where T <: Real

Inputs: a2: weight on 2nd normal distribution μ2: mean of 2nd normal distribution Returns: A dictionary of weights (a1, a2), means (μ1, μ2) and description of truncation (L*, R*) and split point (s).

Clusterpath.cond_mean_on_LRMethod
cond_mean_on_LR(L, R, a1, a2, a3, μ1, μ2, μ3)

Conditional mean on (L, R), defined as μ{L,R} = (∫L^R f(x) dx)^(-1) / (∫_L^R x f(x) dx)

Clusterpath.find_splitMethod
find_split(d, a1, a2, μ1, μ2, L, R;
           a3=0., μ3=0., find_deltas=false, find_split=!find_deltas)

Split the cluster (L, R) into subclusters that maximizes G{L,R}. Returns centered conditional mean (deltas) of the left and right subcluster if `finddeltas` is true. Otherwise return the split point.

Clusterpath.find_truncationMethod
find_truncation(a1, a2, μ1, μ2,
                R_ub=(μ2 + 8), L_lb=(μ1 - 8),
                tuning_par1=0.01, tuning_par2=0.005,
                a3=0, μ3=0, σ=1)

Find population right-truncation points for bimodal Normal mixtures. Inputs: a1, a2, a3: weights for Normal distributions. a1 + a2 + a3 should be == 1 μ1, μ2, μ3: means for normals μ1 = -μ2

Possible returns: 0: negligible result (not a symmetric problem) -1: no truncation

Clusterpath.generate_mixture_normalMethod
generate_mixture_normal(n, m, p)

generate n observations from mixture of univariate normals each with standard deviation 1 and mean parameters m and proportion p.

Clusterpath.inorder_traverseMethod
inorder_traverse(tree::AVLTree)

Traverses tree in inorder way. Returns an Array{Any, 2} with elements of _node type.

Clusterpath.left_rotateMethod
left_rotate(node_x::AVLTreeNode)

Performs a left-rotation on node_x, updates height of the nodes, and returns the rotated node.

Clusterpath.minimum_nodeMethod

minimum_node(tree::AVLTree, node::AVLTreeNode) Returns the AVLTreeNode with minimum value in subtree of node.

Clusterpath.plot_clusterMethod
plot_cluster(x; α=0.1, n_node=1, show::Bool, savefig=false, fname="plot_clst", verbose=false)

Plots the scatter plot of the data x colored according to the cluster assigned by clusterpath algorithm. If the dimension of x is greater than 2, perform PCA and plot two PCs.

***Gaston.jl and gnuplot should be installed and on the PATH of your system. Install gnuplot here. ***

  • x: data
  • α: threshold for BMT-clusterpath
  • n_node: if greater than 1, will assign clusters from previous merge status. (default: 1)
  • show: whether to show the figure.
  • savefig: whether to save the figure as a png file. (default: false)
  • fname: file name to save if savefig is true. (default: "plot_clst")
  • verbose: print out current iteration. (default: false)
Clusterpath.plot_pathMethod
plot_path(x; α=0.1, gt=ones(Int64, size(x,1)), force2dim=true, show::Bool, savefig=false, fname="path_plot", verbose=false)

plot clusterpath with the data(x). If the dimension of x is greater than 2, only plot a combination of first two dimensions.

***Gaston.jl and gnuplot should be installed and on the PATH of your system. Install gnuplot here.***

  • x: the data to cluster.
  • α: threshold for BMT-clusterpath
  • gt: the ground truth labels.
  • force2dim: force dim>2 data to 2-dim data (default: true)
  • show: whether to show the plot in the notebook. Highly recommended not to show if the number of samples is large.
  • savefig: whether to save the figure as a PNG file. (default: false)
  • fname: image file name to be used when savefig is true. (default: "path_plot")
Clusterpath.right_rotateMethod
right_rotate(node_x::AVLTreeNode)

Performs a right-rotation on node_x, updates height of the nodes, and returns the rotated node.

Clusterpath.sorted_rankMethod
sorted_rank(tree::AVLTree, key)

Returns the rank of key present in the tree, if it present. A KeyError is thrown if key is not present.