CompressiveLearning.jl

A Julia package for compressive (aka sketched) learning, i.e. to perform efficiently large-scale learning tasks by compressing the whole dataset into a single small vector of generalized random moments. Currently supported learning tasks are clustering, PCA, and Gaussian modeling using Gaussian distributions with diagonal covariances.

Supported features

  • Sketching with: -- random/quantized/squared Fourier features -- Nyström features with uniform, approximate leverage score (ALS), and DPP sampling.
  • Generic CLOMPR function to solve the inverse problem (learn from the sketch) [5].
  • Practical implementation of CLOMPR for clustering (working well [6]) and GMM fitting (not extensively tested).
  • CL-AMP algorithm (for clustering with random features only) [3].
  • Various optimization routines for PCA.
  • Fast Transforms for high-dimensional settings [2] (at least for k-means/PCA).
  • Differentially-private sketching operators [4].

Installation

To install the package, run (] add CompressiveLearning from the Julia REPL).

Some functionalities of the package rely on additional dependencies:

  • Nyström features with leverage score sampling relies on the BLESS python library (https://github.com/LCSL/bless). PyCall must be manually loaded, and the python dependencies sklearn and numpy installed for the python version used by PyCall. (BLESS itself is already included.)
  • Nyström features with DPP sampling requires PyCall and python dependencies dppy, sklearn and falkon (to install from https://github.com/FalkonML/falkon).

Usage

Main wrappers (see respective docstrings) are CKM, CPCA, CGMM. All of these methods support the keyword arguments of skops_pair, which allow to control usage of differential privacy, fast transforms, choice of the kernel etc. The most critical hyperparameter is the kernel_variance (which should ideally be chosen of the order of the squared minimum inter-cluster distance). Main algorithms to learn from the sketch: CLOMPR (continuous orthogonal matching pursuit, more stable) and CLAMP (approximate message passing, for clustering only, works with smaller sketch sizes but more unstable).

Documentation

Go to docs/ and run make.jl. The documentation will be generated in a build folder, which can e.g. be rendered using LiveServer as follows: using LiveServer; serve(dir="build").

Tests

Unit tests are contained in the test/runtests.jl file. The main file is runtests.jl, and tests relying on python dependencies are located in the test_with_deps.jl file.

Support

Please directly use the gitlab repository for support and pull requests.

Relevant publications

[1] Efficient and Privacy-Preserving Compressive Learning
    2020, PhD Thesis, Université de Rennes 1
    Antoine Chatalic

[2] Large-Scale High-Dimensional Clustering with Fast Sketching,
    2018, ICASSP
    A. Chatalic, R. Gribonval, N. Keriven

[3] Sketched Clustering via Hybrid Approximate Message Passing
    2019, IEEE Transactions on Signal Processing
    E. Byrne, A. Chatalic, R. Gribonval, P. Schniter

[4] Differentially Private Compressive K-Means
    2019, ICASSP.
    V. Schellekens, A. Chatalic, F. Houssiau, Y.-A. De Montjoye, L. Jacques, R. Gribonval

[5] Sketching for Large-Scale Learning of Mixture Models
    2017, Information and Inference: A Journal of the IMA
    N. Keriven, A. Bourrier, R. Gribonval, P. Pérez

[6] Compressive K-Means
    N. Keriven, N. Tremblay, Y. Traonmilin, R. Gribonval
    2017, ICASSP

TODOs

  • Set up documentation online