This type represents an L-ensemble, a (broad) special case of Determinantal Point Process.

The type parameter corresponds to the type of the entries in the matrix given as input (most likely, double precision floats).


EllEnsemble(X::Matrix{T},k :: Kernel)

Construct a full-rank ensemble from a set of points and a kernel function.

X is vector of column vectors (ColVecs) or a vector of row vectors (RowVecs) k is a kernel (see doc for package KernelFunctions.jl)

Example: points in 2d along the circle, and an exponential kernel

t = LinRange(-pi,pi,10)'
X = vcat(cos.(t),sin.(t))
using KernelFunctions
EllEnsemble(L :: AbstractMatrix{T})

Construct an L-ensemble from a (symmetric, non-negative definite) matrix L.

Z = randn(5, 2)
EllEnsemble(Z * Z') #not very useful, presumably

Note that eigen(L) will be called at construction, which may be computationally costly.

An L-Ensemble can also be constructed based on lazy matrix types, i.e. types that leverage a low-rank representation. In the example above we could also use the LowRank type:

Z = randn(5, 2)
EllEnsemble(LowRank(Z)) #more efficient than Z*Z'

Lazy representation of a distance matrix, meaning that the object can be indexed like a matrix, but the values are computed on-the-fly. This is useful whenever an algorithm only requires some of the entries, or when the dataset is very large.

x = randn(2, 100)
D = LazyDist(x) #Euclidean dist. by default
D[3, 2] #the getindex method is defined
D = LazyDist(x, (a, b) -> sum(abs.(a - b))) #L1 distance
D[3, 2]


Construct a DPP from a matrix defining the marginal kernel. Here the matrix must be square and its eigenvalues must be between 0 and 1.



Construct a projection ensemble from a matrix of features. Here we assume $\mathbf{L} = \mathbf{V}\mathbf{V}^t$, so that V must be n \times r, where n is the number of items and r is the rank. V needs not be orthogonal. If orth is set to true (default), then a QR decomposition is performed. If V is orthogonal already, then this computation may be skipped, and you can set orth to false.


The size of the set sampled by a DPP is a random variable. This function returns its mean and standard deviation. See also: rescale!, which changes the mean set size.


Select a random subset of size m from x based on a greedy distance criterion. The initial point is selected uniformly. Then, if sampling == :farthest, at each step, the point selected is one that is farthest from all currently selected points. If sampling == :d2, the algorithm is D²-sampling [vassilvitskii2006k](@vassilvitskii2006k), which is a relaxed stochastic version of farthest-point sampling (selecting points with prob. proportional to squared distance).
x = rand(2, 200);
ind = distance_sampling(ColVecs(x), 40, :farthest)
scatter(x[1, :], x[2, :]; marker_z=map((v) -> v ∈ ind, 1:size(x, 2)), legend=:none)

Compute the elementary symmetric polynomials of the L-ensemble L, e₁(L) ... eₙ(L). e₁(L) is the trace and eₙ(L) is the determinant. The ESPs determine the distribution of the sample size of a DPP:

$p(|X| = k) = \frac{e_k}{\sum_{i=1}^n e_i}$

The default algorithm uses the Newton equations, but may be unstable numerically for n large. If approx=true, a stable saddle-point approximation (as in Barthelmé et al. (2019)) is used instead for all eₖ with k>5.


Compute the Gaussian kernel matrix for points X and parameter σ, ie. a matrix with entry i,j equal to $\exp(-\frac{(x_i-x_j)^2}{2σ^2})$

If σ is missing, it is set using the median heuristic. If the number of points is very large, the median is estimated on a random subset.

x = randn(2, 6)
gaussker(ColVecs(x), 0.1)

See also: rff, KernelMatrix:kernelmatrix

greedy_subset(L :: AbstractLEnsemble,k)

Perform greedy subset selection: find a subset $\mathcal{X}$ of size $k$, such that $\det{L}_{\mathcal{X}}$ is high, by building the set item-by-item and picking at each step the item that maximises the determinant. You can view this as finding an (approximate) mode of the DPP with L-ensemble L.

If $k$ is too large relative to the (numerical) rank of L, the problem is not well-defined as so the algorithm will stop prematurely.

The implementation runs in $\O(nk^2)$ but is not particularly optimised. If the end result looks screwy, it is probably due to numerical instability: try improving the conditioning of $\bL$.


x = randn(2,100) #10 points in dim 2
#same thing but faster:

First-order inclusion probabilities in a k-DPP with L-ensemble L. Uses a (typically very accurate) saddlepoint approximation from Barthelmé, Amblard, Tremblay (2019).



Compute first-order inclusion probabilities, i.e. the probability that each item in 1..n is included in the DPP.

See also: marginal_kernel


Estimate the KL divergence between two (k-)DPPs. The KL divergence is estimated by sampling from the k-DPP with L-ensemble L1 and computing the mean log-ratio of the probabilities. nsamples controls how many samples are taken.


Compute and return the marginal kernel of a DPP, K. The marginal kernel of a DPP is a (n x n) matrix which can be used to find the inclusion probabilities. For any fixed set of indices ind, the probability that ind is included in a sample from the DPP equals det(K[ind,ind]).

nystrom_approx(x :: AbstractVector,ker :: Kernel,ind)

Compute a low-rank approximation of a kernel matrix (with kernel "ker") using the rows and columns indexed by "ind".

using KernelFunctions
x = rand(2, 100)
K = kernelmatrix(SqExponentialKernel(), ColVecs(x))
#build a rank 30 approx. to K
V = nystrom_approx(ColVecs(x), SqExponentialKernel(), 1:30)
norm(K - V * V') #should be small

Compute monomial features up to a certain degree. For instance, if the input consists in vectors in ℝ², then the monomials of degree ≤ 2 are 1,x₁,x₂,x₁x₂,x₁²,x₂² Note that the number of monomials of degree r in dimension d equals ${ d+r \choose r}$

If the input points are in d > 1, indicate whether the input x is d * n or n * d using ColVecs or RowVecs.

The output has points along rows and monomials along columns.


x = randn(2,10) #10 points in dim 2
polyfeatures(ColVecs(x),2) #Output has 10 rows and 6 column
polyfeatures(RowVecs(x),2) #Output has 2 rows and 66 columns



Rescale the L-ensemble such that the expected number of samples equals k. The expected number of samples of a DPP equals $\Tr \mathbf{L}\left( \mathbf{L} + \mathbf{I} \right)$. The function rescales $\mathbf{L}$ to $\alpha \mathbf{L}$ such that $\Tr \alpha \mathbf{L}\left( \alpha \mathbf{L} + \mathbf{I} \right) = k$


Compute Random Fourier Features rahimi2007random for the Gaussian kernel matrix with input points X and parameter σ. Returns a random matrix M such that, in expectation $\mathbf{MM}^t = \mathbf{K}$, the Gaussian kernel matrix. M has 2*m columns. The higher m, the better the approximation.


x = randn(2, 10) #10 points in dim 2
rff(ColVecs(x), 4, 1.0)

See also: gaussker, kernelmatrix


Sample a k-DPP, i.e. a DPP with fixed size. k needs to be strictly smaller than the rank of L (if it equals the rank of L, use a ProjectionEnsemble).

The algorithm uses a saddle-point approximation adapted from Barthelmé, Amblard, Tremblay (2019).


Sample from a DPP with L-ensemble L. The return type is a BitSet (indicating which indices are sampled), use collect to get a vector of indices instead.


Sample from a projection DPP. If nsamples > 1, return a vector of nsamples realisations from the process.

If n is much larger than m, this calls the optimised accept/reject sampler instead of the regular sampler. In addition, the leverage scores are precomputed if nsamples > 1.

The optimised A/R sampler is described in Barthelme, S, Tremblay, N, Amblard, P-O, (2022) A Faster Sampler for Discrete Determinantal Point Processes.

    Z = randn(150,10) #random feature matrix
    Pp = ProjectionEnsemble(Z)
    sample(Pp) #should output a vector of length 10
    sample(Pp,nsamples=5) #should output a vector of 5 realisations