Clusterpath.AVLTree
— Typemutable struct AVLTree{T}()
Creates an empty AVL tree. tree[i]
returns i
th smallest node.
Supported operations
Base.haskey(tree::AVLTree{K}, d::K)
Returns a bool which indicates presence of a node with key d
Base.insert!(tree::AVLTree{K}, d::K)
Inserts a node with key d
of type K
(Base.push!
is recommanded)
Base.push!(tree::AVLTree{K}, key0)
Inserts a node with key d
of arbitrary type that can be converted into type K
Base.delete!(tree::AVLTree{K}, d::K)
Deletes a node with key d
of type K
sorted_rank(tree::AVLTree{K}, key::K)
Returns the rank of key
present in the tree
, if it present. A KeyError is thrown if key
is not present.
Base.getindex(tree::AVLTree{K}, ind::Integer)
Returns the ind
th node
Clusterpath._prepare!
— Method_prepare(solution_i; for_path::Bool)
prepare the return of cast_solution()
for plot_path()
, or plot_cluster()
.
solution_i
: single solution path element ofcast_solution()
.for_path
: iftrue
then prepare forplot_path()
,false
then prepare forplot_cluster()
.
Clusterpath.assign_cluster
— Methodassign_cluster(x; α=0.1, verbose=false)
assign cluster to each of the observations in x
. returns an array of length=size(x, 1) of cluster indices.
x
: dataα
: threshold for BMT-clusterpath
Clusterpath.cast_solution
— Methodcast_solution(x; α=0.1, silence_warning=false)
A generator function that casts a dataframe of solution path using clusterpath()
. The function yields the solution path of each observation one at an iteration. This return will be used in plot_path()
and plot_cluster()
.
x
: the data for which we want to cluster.α
: threshold for Big Merge Tracker procedure.α
shoulbe be ≥ 0. The smaller the value ofα
, the more detailed the solution path will be. The larger theα
, the faster the algorithm will run. (default:0.1
)
Clusterpath.clusterpath
— Methodclusterpath!(y; α=0.1, jitter=1e-6, return_split=false, silence_warning=false)
Perform convex clustering via α-thresholded clusterpath algorithm to y
. To avoid ties, add a small perturbation (jitter
) to each data, and then sort it. Returns a dictionary of
- (1) idx: cluster indices assigned to each elements in
y
- (2) lambda: λ values in each merge
- (3) alpha: solution corresponding to each observation where merge happened
- (4) n_cluster: number of clusters wrt merging algorithm
- (5) n_splits: number of total splits, in corresponding top-down algorithm
- (6) splits: sample split point (only get returned when
return_split=true
)
Clusterpath.clusterpath_pop
— Methodclusterpath_pop(a2::T, μ2::T; a1=1-a2, μ1=-μ1, a3=0., μ3=0, σ=1) where T <: Real
Inputs: a2: weight on 2nd normal distribution μ2: mean of 2nd normal distribution Returns: A dictionary of weights (a1
, a2
), means (μ1
, μ2
) and description of truncation (L*
, R*
) and split point (s
).
Clusterpath.cond_mean_on_LR
— Methodcond_mean_on_LR(L, R, a1, a2, a3, μ1, μ2, μ3)
Conditional mean on (L, R), defined as μ{L,R} = (∫L^R f(x) dx)^(-1) / (∫_L^R x f(x) dx)
Clusterpath.dnorm
— Methoddnorm(x)
Julia equivalent of R's dnorm
Clusterpath.find_split
— Methodfind_split(d, a1, a2, μ1, μ2, L, R;
a3=0., μ3=0., find_deltas=false, find_split=!find_deltas)
Split the cluster (L, R) into subclusters that maximizes G{L,R}. Returns centered conditional mean (deltas) of the left and right subcluster if `finddeltas` is true. Otherwise return the split point.
Clusterpath.find_truncation
— Methodfind_truncation(a1, a2, μ1, μ2,
R_ub=(μ2 + 8), L_lb=(μ1 - 8),
tuning_par1=0.01, tuning_par2=0.005,
a3=0, μ3=0, σ=1)
Find population right-truncation points for bimodal Normal mixtures. Inputs: a1, a2, a3: weights for Normal distributions. a1 + a2 + a3 should be == 1 μ1, μ2, μ3: means for normals μ1 = -μ2
Possible returns: 0: negligible result (not a symmetric problem) -1: no truncation
Clusterpath.generate_mixture_normal
— Methodgenerate_mixture_normal(n, m, p)
generate n
observations from mixture of univariate normals each with standard deviation 1 and mean parameters m
and proportion p
.
Clusterpath.inorder_traverse
— Methodinorder_traverse(tree::AVLTree)
Traverses tree in inorder way. Returns an Array{Any, 2} with elements of _node type.
Clusterpath.left_rotate
— Methodleft_rotate(node_x::AVLTreeNode)
Performs a left-rotation on node_x
, updates height of the nodes, and returns the rotated node.
Clusterpath.merge_dfs
— Methodmerge_dfs(df1, df2)
merge two solution path dataframes with outerjoin()
Clusterpath.minimum_node
— Methodminimum_node(tree::AVLTree, node::AVLTreeNode) Returns the AVLTreeNode with minimum value in subtree of node
.
Clusterpath.plot_cluster
— Methodplot_cluster(x; α=0.1, n_node=1, show::Bool, savefig=false, fname="plot_clst", verbose=false)
Plots the scatter plot of the data x
colored according to the cluster assigned by clusterpath algorithm. If the dimension of x
is greater than 2, perform PCA and plot two PCs.
***Gaston.jl
and gnuplot
should be installed and on the PATH of your system. Install gnuplot here. ***
x
: dataα
: threshold for BMT-clusterpathn_node
: if greater than1
, will assign clusters from previous merge status. (default:1
)show
: whether to show the figure.savefig
: whether to save the figure as a png file. (default:false
)fname
: file name to save ifsavefig
is true. (default:"plot_clst"
)verbose
: print out current iteration. (default:false
)
Clusterpath.plot_path
— Methodplot_path(x; α=0.1, gt=ones(Int64, size(x,1)), force2dim=true, show::Bool, savefig=false, fname="path_plot", verbose=false)
plot clusterpath with the data(x
). If the dimension of x
is greater than 2, only plot a combination of first two dimensions.
***Gaston.jl
and gnuplot
should be installed and on the PATH of your system. Install gnuplot here.***
x
: the data to cluster.α
: threshold for BMT-clusterpathgt
: the ground truth labels.force2dim
: force dim>2 data to 2-dim data (default:true
)show
: whether to show the plot in the notebook. Highly recommended not to show if the number of samples is large.savefig
: whether to save the figure as a PNG file. (default:false
)fname
: image file name to be used whensavefig
istrue
. (default:"path_plot"
)
Clusterpath.pnorm
— Methodpnorm(x)
Julia equivalent of R's pnorm
Clusterpath.rep
— Methodrep(x, times)
Julia equivalent of R's rep
.
Clusterpath.right_rotate
— Methodright_rotate(node_x::AVLTreeNode)
Performs a right-rotation on node_x
, updates height of the nodes, and returns the rotated node.
Clusterpath.sorted_rank
— Methodsorted_rank(tree::AVLTree, key)
Returns the rank of key
present in the tree
, if it present. A KeyError is thrown if key
is not present.