Docstrings · Clusterpath.jl

Clusterpath.AVLTree — Type

mutable struct AVLTree{T}()

Creates an empty AVL tree. tree[i] returns ith smallest node.

Supported operations

Base.haskey(tree::AVLTree{K}, d::K)

Returns a bool which indicates presence of a node with key d

Base.insert!(tree::AVLTree{K}, d::K)

Inserts a node with key d of type K (Base.push! is recommanded)

Base.push!(tree::AVLTree{K}, key0)

Inserts a node with key d of arbitrary type that can be converted into type K

Base.delete!(tree::AVLTree{K}, d::K)

Deletes a node with key d of type K

sorted_rank(tree::AVLTree{K}, key::K)

Returns the rank of key present in the tree, if it present. A KeyError is thrown if key is not present.

Base.getindex(tree::AVLTree{K}, ind::Integer)

Returns the indth node

Clusterpath._prepare! — Method

_prepare(solution_i; for_path::Bool)

prepare the return of cast_solution() for plot_path(), or plot_cluster().

solution_i: single solution path element of cast_solution().
for_path: if true then prepare for plot_path(), false then prepare for plot_cluster().

Clusterpath.assign_cluster — Method

assign_cluster(x; α=0.1, verbose=false)

assign cluster to each of the observations in x. returns an array of length=size(x, 1) of cluster indices.

x: data
α: threshold for BMT-clusterpath

Clusterpath.cast_solution — Method

cast_solution(x; α=0.1, silence_warning=false)

A generator function that casts a dataframe of solution path using clusterpath(). The function yields the solution path of each observation one at an iteration. This return will be used in plot_path() and plot_cluster().

x: the data for which we want to cluster.
α: threshold for Big Merge Tracker procedure. α shoulbe be ≥ 0. The smaller the value of α, the more detailed the solution path will be. The larger the α, the faster the algorithm will run. (default: 0.1)

Clusterpath.clusterpath — Method

clusterpath!(y; α=0.1, jitter=1e-6, return_split=false, silence_warning=false)

Perform convex clustering via α-thresholded clusterpath algorithm to y. To avoid ties, add a small perturbation (jitter) to each data, and then sort it. Returns a dictionary of

(1) idx: cluster indices assigned to each elements in y
(2) lambda: λ values in each merge
(3) alpha: solution corresponding to each observation where merge happened
(4) n_cluster: number of clusters wrt merging algorithm
(5) n_splits: number of total splits, in corresponding top-down algorithm
(6) splits: sample split point (only get returned when return_split=true)

Clusterpath.clusterpath_pop — Method

clusterpath_pop(a2::T, μ2::T; a1=1-a2, μ1=-μ1, a3=0., μ3=0, σ=1) where T <: Real

Inputs: a2: weight on 2nd normal distribution μ2: mean of 2nd normal distribution Returns: A dictionary of weights (a1, a2), means (μ1, μ2) and description of truncation (L*, R*) and split point (s).

Clusterpath.cond_mean_on_LR — Method

cond_mean_on_LR(L, R, a1, a2, a3, μ1, μ2, μ3)

Conditional mean on (L, R), defined as μ{L,R} = (∫L^R f(x) dx)^(-1) / (∫_L^R x f(x) dx)

Clusterpath.dnorm — Method

dnorm(x)

Julia equivalent of R's dnorm

Clusterpath.find_split — Method

find_split(d, a1, a2, μ1, μ2, L, R;
           a3=0., μ3=0., find_deltas=false, find_split=!find_deltas)

Split the cluster (L, R) into subclusters that maximizes G{L,R}. Returns centered conditional mean (deltas) of the left and right subcluster if `finddeltas` is true. Otherwise return the split point.

Clusterpath.find_truncation — Method

find_truncation(a1, a2, μ1, μ2,
                R_ub=(μ2 + 8), L_lb=(μ1 - 8),
                tuning_par1=0.01, tuning_par2=0.005,
                a3=0, μ3=0, σ=1)

Find population right-truncation points for bimodal Normal mixtures. Inputs: a1, a2, a3: weights for Normal distributions. a1 + a2 + a3 should be == 1 μ1, μ2, μ3: means for normals μ1 = -μ2

Possible returns: 0: negligible result (not a symmetric problem) -1: no truncation

Clusterpath.generate_mixture_normal — Method

generate_mixture_normal(n, m, p)

generate n observations from mixture of univariate normals each with standard deviation 1 and mean parameters m and proportion p.

Clusterpath.inorder_traverse — Method

inorder_traverse(tree::AVLTree)

Traverses tree in inorder way. Returns an Array{Any, 2} with elements of _node type.

Clusterpath.left_rotate — Method

left_rotate(node_x::AVLTreeNode)

Performs a left-rotation on node_x, updates height of the nodes, and returns the rotated node.

Clusterpath.merge_dfs — Method

merge_dfs(df1, df2)

merge two solution path dataframes with outerjoin()

Clusterpath.minimum_node — Method

minimum_node(tree::AVLTree, node::AVLTreeNode) Returns the AVLTreeNode with minimum value in subtree of node.

Clusterpath.plot_cluster — Method

plot_cluster(x; α=0.1, n_node=1, show::Bool, savefig=false, fname="plot_clst", verbose=false)

Plots the scatter plot of the data x colored according to the cluster assigned by clusterpath algorithm. If the dimension of x is greater than 2, perform PCA and plot two PCs.

***Gaston.jl and gnuplot should be installed and on the PATH of your system. Install gnuplot here. ***

x: data
α: threshold for BMT-clusterpath
n_node: if greater than 1, will assign clusters from previous merge status. (default: 1)
show: whether to show the figure.
savefig: whether to save the figure as a png file. (default: false)
fname: file name to save if savefig is true. (default: "plot_clst")
verbose: print out current iteration. (default: false)

Clusterpath.plot_path — Method

plot_path(x; α=0.1, gt=ones(Int64, size(x,1)), force2dim=true, show::Bool, savefig=false, fname="path_plot", verbose=false)

plot clusterpath with the data(x). If the dimension of x is greater than 2, only plot a combination of first two dimensions.

***Gaston.jl and gnuplot should be installed and on the PATH of your system. Install gnuplot here.***

x: the data to cluster.
α: threshold for BMT-clusterpath
gt: the ground truth labels.
force2dim: force dim>2 data to 2-dim data (default: true)
show: whether to show the plot in the notebook. Highly recommended not to show if the number of samples is large.
savefig: whether to save the figure as a PNG file. (default: false)
fname: image file name to be used when savefig is true. (default: "path_plot")

Clusterpath.pnorm — Method

pnorm(x)

Julia equivalent of R's pnorm

Clusterpath.rep — Method

rep(x, times)

Julia equivalent of R's rep.

Clusterpath.right_rotate — Method

right_rotate(node_x::AVLTreeNode)

Performs a right-rotation on node_x, updates height of the nodes, and returns the rotated node.

Clusterpath.sorted_rank — Method

sorted_rank(tree::AVLTree, key)

Returns the rank of key present in the tree, if it present. A KeyError is thrown if key is not present.