ClusterEnsembles.jl

CI CodeCov License: MIT

A Julia package for cluster ensembles. Cluster ensembles generate a single consensus clustering label by using base labels obtained from multiple clustering algorithms. The consensus clustering label stably achieves a high clustering performance.

Installation

Pkg.add("ClusterEnsembles")

Usage

cluster_ensembles is used as follows.

julia> using ClusterEnsembles

julia> label1 = [1 1 1 2 2 3 3];

julia> label2 = [2 2 2 3 3 1 1];

julia> label3 = [1 1 2 2 3 3 3];

julia> label4 = [1 2 missing 1 2 missing missing];

julia> labels = [label1' label2' label3' label4']
7×4 Matrix{Union{Missing, Int64}}:
 1  2  1  1
 1  2  1  2
 1  2  2   missing
 2  3  2  1
 2  3  3  2
 3  1  3   missing
 3  1  3   missing

julia> label_ce = cluster_ensembles(labels)
7-element Vector{Int64}:
 1
 1
 1
 3
 3
 2
 2

Parameters

  • labels: Labels generated by base clustering algorithms such as K-Means.

    Note: Assume that the length of each label is the same.

  • nclass: Number of classes in a consensus clustering label (default=nothing). If nclass=nothing, set the maximum number of classes in each label except missing values. In other words, set nclass=3 automatically in the above.

  • alg: {:cspa, :hgpa, :mcla, :hbgf, :nmf, :all} (default=:hbgf)

    :cspa: Cluster-based Similarity Partitioning Algorithm [1].

    :hgpa: HyperGraph Partitioning Algorithm [1].

    :mcla: Meta-CLustering Algorithm [1].

    :hbgf: Hybrid Bipartite Graph Formulation [2].

    :nmf: NMF-based consensus clustering [3].

    :all: The consensus clustering label with the largest objective function value [1] is returned among the results of all solvers.

    Note: Please use :hbgf for large-scale labels.

  • random_state: Used for :mcla and :nmf (default=nothing). Please pass a nonnegative integer for reproducible results.

Return

  • label_ce: A consensus clustering label generated by cluster ensembles.

References

[1] A. Strehl and J. Ghosh, "Cluster ensembles -- a knowledge reuse framework for combining multiple partitions," Journal of Machine Learning Research, vol. 3, pp. 583-617, 2002.

[2] X. Z. Fern and C. E. Brodley, "Solving cluster ensemble problems by bipartite graph partitioning," In Proceedings of the Twenty-First International Conference on Machine Learning, p. 36, 2004.

[3] T. Li, C. Ding, and M. I. Jordan, "Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization," In Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 577-582, 2007.

[4] J. Ghosh and A. Acharya, "Cluster ensembles," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 305-315, 2011.