Given samples x_nu
and x_de
from distributions p_nu
and p_de
, it is very
useful to estimate the density ratio r(x) = p_nu(x) / p_de(x)
for all valid x
.
This problem is known in the literature as the density ratio estimation problem
(Sugiyama et al. 2012).
Naive solutions based on the ratio of individual estimators for numerator and denominator densities perform poorly, particularly in high dimensions. This package provides density ratio estimators that perform well with a moderately large number of dimensions.
Installation
Get the latest stable release with Julia's package manager:
] add DensityRatioEstimation
Usage
Given two indexable collections x_nu
and x_de
of samples from p_nu
and
p_de
, one can estimate the density ratio at all samples in x_de
:
using DensityRatioEstimation, Optim
r = densratio(x_nu, x_de, KLIEP(), optlib=OptimLib)
The third argument of the densratio
function is a density ratio estimator.
Currently, this package implements the following estimators:
Estimator | Type1 | References |
---|---|---|
Kernel Mean Matching | KMM , uKMM |
Huang et al. 2006 |
Kullback-Leibler Importance Estimation Procedure | KLIEP |
Sugiyama et al. 2008 |
Least-Squares Importance Fitting | LSIF |
Kanamori et al. 2009 |
1 We use the naming convention of prefixing the type name with u
for the unconstrained variant of the corresponding estimator.
The fourth argument optlib
specifies the optimization package used to implement
the estimator. Some estimators are implemented with different optimization packages
to facilitate the usage in different environments. In the example above, users that
already have the Optim.jl package in
their environment can promptly use the KLIEP
estimator implemented with that package.
Each estimator has a default optimization package, and so the function call above
can be simplified given that the optimization package is already loaded:
r = densratio(x_nu, x_de, KLIEP())
Different implementations of the same estimator are loaded using the
Requires.jl package, and
the keyword argument optlib
can be any of:
JuliaLib
- Pure Julia implementationOptimLib
- Optim.jl implementationConvexLib
- Convex.jl implementationJuMPLib
- JuMP.jl implementation
To find out the default implementation for an estimator, please use
default_optlib(KLIEP)
and to find out the available implementations, please use
available_optlib(KLIEP)
Some methods support the evaluation of the density ratio at all x
, besides the
denominator samples. In this case, the following line returns a function r(x)
that can be evaluated at new unseen samples:
r = densratiofunc(x_nu, x_de, KLIEP())
Hyperparameter tuning
Methods like KLIEP
are equipped with tuning strategies, and its hyperparameters
can be found using the following line:
dre = fit(KLIEP, x_nu, x_de, LCV((σ=[1.,2.,3.],b=[100]))
The function returns a KLIEP
instance with parameters optimized for the samples.
In this case, the line uses likelihood cross-validation LCV
as the tuning
strategy. It accepts a named tuple with the hyperparameter ranges for KLIEP
,
the kernel width σ
and the number of basis functions b
. Currently, the
following tuning strategies are implemented:
Tuning | References |
---|---|
LCV | Sugiyama et al. 2008 |