# CompressiveLearning.jl

A Julia package for compressive (aka sketched) learning, i.e. to perform efficiently large-scale learning tasks by compressing the whole dataset into a single small vector of generalized random moments. Currently supported learning tasks are clustering, PCA, and Gaussian modeling using Gaussian distributions with diagonal covariances.

## Supported features

- Sketching with: -- random/quantized/squared Fourier features -- Nyström features with uniform, approximate leverage score (ALS), and DPP sampling.
- Generic
`CLOMPR`

function to solve the inverse problem (learn from the sketch) [5]. - Practical implementation of CLOMPR for clustering (working well [6]) and GMM fitting (not extensively tested).
- CL-AMP algorithm (for clustering with random features only) [3].
- Various optimization routines for PCA.
- Fast Transforms for high-dimensional settings [2] (at least for k-means/PCA).
- Differentially-private sketching operators [4].

## Installation

To install the package, run (`] add CompressiveLearning`

from the Julia REPL).

Some functionalities of the package rely on additional dependencies:

- Nyström features with leverage score sampling relies on the BLESS python library (https://github.com/LCSL/bless).
`PyCall`

must be manually loaded, and the python dependencies`sklearn`

and`numpy`

installed for the python version used by PyCall. (BLESS itself is already included.) - Nyström features with DPP sampling requires
`PyCall`

and python dependencies`dppy`

,`sklearn`

and`falkon`

(to install from https://github.com/FalkonML/falkon).

## Usage

Main wrappers (see respective docstrings) are `CKM`

, `CPCA`

, `CGMM`

. All of these methods support the keyword arguments of `skops_pair`

, which allow to control usage of differential privacy, fast transforms, choice of the kernel etc.
The most critical hyperparameter is the `kernel_variance`

(which should ideally be chosen of the order of the squared minimum inter-cluster distance).
Main algorithms to learn from the sketch: `CLOMPR`

(continuous orthogonal matching pursuit, more stable) and `CLAMP`

(approximate message passing, for clustering only, works with smaller sketch sizes but more unstable).

## Documentation

Go to `docs/`

and run `make.jl`

. The documentation will be generated in a `build`

folder, which can e.g. be rendered using `LiveServer`

as follows: `using LiveServer; serve(dir="build")`

.

## Tests

Unit tests are contained in the `test/runtests.jl`

file. The main file is `runtests.jl`

, and tests relying on python dependencies are located in the `test_with_deps.jl`

file.

## Support

Please directly use the gitlab repository for support and pull requests.

## Relevant publications

```
[1] Efficient and Privacy-Preserving Compressive Learning
2020, PhD Thesis, Université de Rennes 1
Antoine Chatalic
[2] Large-Scale High-Dimensional Clustering with Fast Sketching,
2018, ICASSP
A. Chatalic, R. Gribonval, N. Keriven
[3] Sketched Clustering via Hybrid Approximate Message Passing
2019, IEEE Transactions on Signal Processing
E. Byrne, A. Chatalic, R. Gribonval, P. Schniter
[4] Differentially Private Compressive K-Means
2019, ICASSP.
V. Schellekens, A. Chatalic, F. Houssiau, Y.-A. De Montjoye, L. Jacques, R. Gribonval
[5] Sketching for Large-Scale Learning of Mixture Models
2017, Information and Inference: A Journal of the IMA
N. Keriven, A. Bourrier, R. Gribonval, P. Pérez
[6] Compressive K-Means
N. Keriven, N. Tremblay, Y. Traonmilin, R. Gribonval
2017, ICASSP
```

## TODOs

- Set up documentation online