CompressiveLearning.jl
A Julia package for compressive (aka sketched) learning, i.e. to perform efficiently large-scale learning tasks by compressing the whole dataset into a single small vector of generalized random moments. Currently supported learning tasks are clustering, PCA, and Gaussian modeling using Gaussian distributions with diagonal covariances.
Supported features
- Sketching with: -- random/quantized/squared Fourier features -- Nyström features with uniform, approximate leverage score (ALS), and DPP sampling.
- Generic
CLOMPR
function to solve the inverse problem (learn from the sketch) [5]. - Practical implementation of CLOMPR for clustering (working well [6]) and GMM fitting (not extensively tested).
- CL-AMP algorithm (for clustering with random features only) [3].
- Various optimization routines for PCA.
- Fast Transforms for high-dimensional settings [2] (at least for k-means/PCA).
- Differentially-private sketching operators [4].
Installation
To install the package, run (] add CompressiveLearning
from the Julia REPL).
Some functionalities of the package rely on additional dependencies:
- Nyström features with leverage score sampling relies on the BLESS python library (https://github.com/LCSL/bless).
PyCall
must be manually loaded, and the python dependenciessklearn
andnumpy
installed for the python version used by PyCall. (BLESS itself is already included.) - Nyström features with DPP sampling requires
PyCall
and python dependenciesdppy
,sklearn
andfalkon
(to install from https://github.com/FalkonML/falkon).
Usage
Main wrappers (see respective docstrings) are CKM
, CPCA
, CGMM
. All of these methods support the keyword arguments of skops_pair
, which allow to control usage of differential privacy, fast transforms, choice of the kernel etc.
The most critical hyperparameter is the kernel_variance
(which should ideally be chosen of the order of the squared minimum inter-cluster distance).
Main algorithms to learn from the sketch: CLOMPR
(continuous orthogonal matching pursuit, more stable) and CLAMP
(approximate message passing, for clustering only, works with smaller sketch sizes but more unstable).
Documentation
Go to docs/
and run make.jl
. The documentation will be generated in a build
folder, which can e.g. be rendered using LiveServer
as follows: using LiveServer; serve(dir="build")
.
Tests
Unit tests are contained in the test/runtests.jl
file. The main file is runtests.jl
, and tests relying on python dependencies are located in the test_with_deps.jl
file.
Support
Please directly use the gitlab repository for support and pull requests.
Relevant publications
[1] Efficient and Privacy-Preserving Compressive Learning
2020, PhD Thesis, Université de Rennes 1
Antoine Chatalic
[2] Large-Scale High-Dimensional Clustering with Fast Sketching,
2018, ICASSP
A. Chatalic, R. Gribonval, N. Keriven
[3] Sketched Clustering via Hybrid Approximate Message Passing
2019, IEEE Transactions on Signal Processing
E. Byrne, A. Chatalic, R. Gribonval, P. Schniter
[4] Differentially Private Compressive K-Means
2019, ICASSP.
V. Schellekens, A. Chatalic, F. Houssiau, Y.-A. De Montjoye, L. Jacques, R. Gribonval
[5] Sketching for Large-Scale Learning of Mixture Models
2017, Information and Inference: A Journal of the IMA
N. Keriven, A. Bourrier, R. Gribonval, P. Pérez
[6] Compressive K-Means
N. Keriven, N. Tremblay, Y. Traonmilin, R. Gribonval
2017, ICASSP
TODOs
- Set up documentation online