Determinantal.EllEnsemble
— TypeEllEnsemble{T}
This type represents an L-ensemble, a (broad) special case of Determinantal Point Process.
The type parameter corresponds to the type of the entries in the matrix given as input (most likely, double precision floats).
Determinantal.EllEnsemble
— MethodEllEnsemble(X::Matrix{T},k :: Kernel)
Construct a full-rank ensemble from a set of points and a kernel function.
X is vector of column vectors (ColVecs) or a vector of row vectors (RowVecs) k is a kernel (see doc for package KernelFunctions.jl)
Example: points in 2d along the circle, and an exponential kernel
t = LinRange(-pi,pi,10)'
X = vcat(cos.(t),sin.(t))
using KernelFunctions
L=EllEnsemble(ColVecs(X),ExponentialKernel())
Determinantal.EllEnsemble
— MethodEllEnsemble(L :: AbstractMatrix{T})
Construct an L-ensemble from a (symmetric, non-negative definite) matrix L.
Z = randn(5, 2)
EllEnsemble(Z * Z') #not very useful, presumably
Note that eigen(L) will be called at construction, which may be computationally costly.
An L-Ensemble can also be constructed based on lazy matrix types, i.e. types that leverage a low-rank representation. In the example above we could also use the LowRank type:
Z = randn(5, 2)
EllEnsemble(LowRank(Z)) #more efficient than Z*Z'
Determinantal.LazyDist
— MethodLazyDist(x)
Lazy representation of a distance matrix, meaning that the object can be indexed like a matrix, but the values are computed on-the-fly. This is useful whenever an algorithm only requires some of the entries, or when the dataset is very large.
x = randn(2, 100)
D = LazyDist(x) #Euclidean dist. by default
D[3, 2] #the getindex method is defined
D = LazyDist(x, (a, b) -> sum(abs.(a - b))) #L1 distance
D[3, 2]
Determinantal.MarginalDPP
— MethodMarginalDPP(V::Matrix{T})
Construct a DPP from a matrix defining the marginal kernel. Here the matrix must be square and its eigenvalues must be between 0 and 1.
Determinantal.ProjectionEnsemble
— MethodProjectionEnsemble(V::Matrix{T},orth=true)
Construct a projection ensemble from a matrix of features. Here we assume $\mathbf{L} = \mathbf{V}\mathbf{V}^t$, so that V must be n \times r, where n is the number of items and r is the rank. V needs not be orthogonal. If orth is set to true (default), then a QR decomposition is performed. If V is orthogonal already, then this computation may be skipped, and you can set orth to false.
Determinantal.cardinal
— Methodcardinal(L::AbstractLEnsemble)
The size of the set sampled by a DPP is a random variable. This function returns its mean and standard deviation. See also: rescale!, which changes the mean set size.
Determinantal.distance_sampling
— Functiondistance_sampling(x,m,sampling)
Select a random subset of size m from x based on a greedy distance criterion. The initial point is selected uniformly. Then, if sampling == :farthest, at each step, the point selected is one that is farthest from all currently selected points. If sampling == :d2, the algorithm is D²-sampling [vassilvitskii2006k](@vassilvitskii2006k), which is a relaxed stochastic version of farthest-point sampling (selecting points with prob. proportional to squared distance).
x = rand(2, 200);
ind = distance_sampling(ColVecs(x), 40, :farthest)
scatter(x[1, :], x[2, :]; marker_z=map((v) -> v ∈ ind, 1:size(x, 2)), legend=:none)
Determinantal.esp
— Functionesp(L::AbstractLEnsemble,approx=false)
Compute the elementary symmetric polynomials of the L-ensemble L, e₁(L) ... eₙ(L). e₁(L) is the trace and eₙ(L) is the determinant. The ESPs determine the distribution of the sample size of a DPP:
$p(|X| = k) = \frac{e_k}{\sum_{i=1}^n e_i}$
The default algorithm uses the Newton equations, but may be unstable numerically for n large. If approx=true, a stable saddle-point approximation (as in Barthelmé et al. (2019)) is used instead for all eₖ with k>5.
Determinantal.gaussker
— Methodgaussker(X,σ)
Compute the Gaussian kernel matrix for points X and parameter σ, ie. a matrix with entry i,j equal to $\exp(-\frac{(x_i-x_j)^2}{2σ^2})$
If σ is missing, it is set using the median heuristic. If the number of points is very large, the median is estimated on a random subset.
x = randn(2, 6)
gaussker(ColVecs(x), 0.1)
See also: rff, KernelMatrix:kernelmatrix
Determinantal.greedy_subset
— Methodgreedy_subset(L :: AbstractLEnsemble,k)
Perform greedy subset selection: find a subset $\mathcal{X}$ of size $k$, such that $\det{L}_{\mathcal{X}}$ is high, by building the set item-by-item and picking at each step the item that maximises the determinant. You can view this as finding an (approximate) mode of the DPP with L-ensemble L.
If $k$ is too large relative to the (numerical) rank of L, the problem is not well-defined as so the algorithm will stop prematurely.
The implementation runs in $\O(nk^2)$ but is not particularly optimised. If the end result looks screwy, it is probably due to numerical instability: try improving the conditioning of $\bL$.
Example
x = randn(2,100) #10 points in dim 2
greedy_subset(EllEnsemble(gaussker(x)),12)
#same thing but faster:
greedy_subset(gaussker(x),12)
Determinantal.inclusion_prob
— Methodinclusion_prob(L::AbstractLEnsemble,k)
First-order inclusion probabilities in a k-DPP with L-ensemble L. Uses a (typically very accurate) saddlepoint approximation from Barthelmé, Amblard, Tremblay (2019).
Determinantal.inclusion_prob
— Methodinclusion_prob(L::AbstractLEnsemble)
Compute first-order inclusion probabilities, i.e. the probability that each item in 1..n is included in the DPP.
See also: marginal_kernel
Determinantal.kl_divergence
— Methodkl_divergence(L1::AbstractLEnsemble,L2::AbstractLEnsemble,k::Int;nsamples=100)
Estimate the KL divergence between two (k-)DPPs. The KL divergence is estimated by sampling from the k-DPP with L-ensemble L1 and computing the mean log-ratio of the probabilities. nsamples controls how many samples are taken.
Determinantal.marginal_kernel
— Method marginal_kernel(L::AbstractLEnsemble)
Compute and return the marginal kernel of a DPP, K. The marginal kernel of a DPP is a (n x n) matrix which can be used to find the inclusion probabilities. For any fixed set of indices ind, the probability that ind is included in a sample from the DPP equals det(K[ind,ind]).
Determinantal.nystrom_approx
— Methodnystrom_approx(x :: AbstractVector,ker :: Kernel,ind)
Compute a low-rank approximation of a kernel matrix (with kernel "ker") using the rows and columns indexed by "ind".
using KernelFunctions
x = rand(2, 100)
K = kernelmatrix(SqExponentialKernel(), ColVecs(x))
#build a rank 30 approx. to K
V = nystrom_approx(ColVecs(x), SqExponentialKernel(), 1:30)
norm(K - V * V') #should be small
Determinantal.polyfeatures
— Methodpolyfeatures(x,order)
Compute monomial features up to a certain degree. For instance, if the input consists in vectors in ℝ², then the monomials of degree ≤ 2 are 1,x₁,x₂,x₁x₂,x₁²,x₂² Note that the number of monomials of degree r in dimension d equals ${ d+r \choose r}$
If the input points are in d > 1, indicate whether the input x is d * n or n * d using ColVecs or RowVecs.
The output has points along rows and monomials along columns.
Examples
x = randn(2,10) #10 points in dim 2
polyfeatures(ColVecs(x),2) #Output has 10 rows and 6 column
polyfeatures(RowVecs(x),2) #Output has 2 rows and 66 columns
Determinantal.rescale!
— Functionrescale!(L,k)
$\DeclareMathOperator{\Tr}{Tr}$
Rescale the L-ensemble such that the expected number of samples equals k. The expected number of samples of a DPP equals $\Tr \mathbf{L}\left( \mathbf{L} + \mathbf{I} \right)$. The function rescales $\mathbf{L}$ to $\alpha \mathbf{L}$ such that $\Tr \alpha \mathbf{L}\left( \alpha \mathbf{L} + \mathbf{I} \right) = k$
Determinantal.rff
— Methodrff(X,m,σ)
Compute Random Fourier Features rahimi2007random for the Gaussian kernel matrix with input points X and parameter σ. Returns a random matrix M such that, in expectation $\mathbf{MM}^t = \mathbf{K}$, the Gaussian kernel matrix. M has 2*m columns. The higher m, the better the approximation.
Examples
x = randn(2, 10) #10 points in dim 2
rff(ColVecs(x), 4, 1.0)
See also: gaussker, kernelmatrix
Determinantal.sample
— Methodsample(L::AbstractLEnsemble,k)
Sample a k-DPP, i.e. a DPP with fixed size. k needs to be strictly smaller than the rank of L (if it equals the rank of L, use a ProjectionEnsemble).
The algorithm uses a saddle-point approximation adapted from Barthelmé, Amblard, Tremblay (2019).
Determinantal.sample
— Method sample(L::AbstractEnsemble)
Sample from a DPP with L-ensemble L. The return type is a BitSet (indicating which indices are sampled), use collect to get a vector of indices instead.
Determinantal.sample
— Method sample(L::ProjectionEnsemble;nsamples=1)
Sample from a projection DPP. If nsamples > 1, return a vector of nsamples realisations from the process.
If n is much larger than m, this calls the optimised accept/reject sampler instead of the regular sampler. In addition, the leverage scores are precomputed if nsamples > 1.
The optimised A/R sampler is described in Barthelme, S, Tremblay, N, Amblard, P-O, (2022) A Faster Sampler for Discrete Determinantal Point Processes.
Z = randn(150,10) #random feature matrix
Pp = ProjectionEnsemble(Z)
sample(Pp) #should output a vector of length 10
sample(Pp,nsamples=5) #should output a vector of 5 realisations