FastParzenWindows.cvfpw
— Methodcvresults = cvfpw(X, r_range; numfolds = 10, seed = 1, gamma = 1e-6, randrepeats = 7)
Performs cross validation for the radius parameter for partitioning the data in hyperdiscs. Candidate values for the radius parameter are specified in r_range
. Seed controls the generation of the folds and is also the random seed of fpw
. Returns a matrix of dimensions (number of radii candidates)×(number of folds) of log-likelihoods evaluated on left out folds.
Parameters
X
is a N×D data matrix, i.e. there are N data items of dimension D.r_range
is an array or range of scalars specifying the candidate radii.numfolds
specifies the number of folds in the cross-validation.seed
controls the generation of the folds and is also the random seed offpw
gamma
is a scalar that specified a multiple of the identity matrix, i.e. γI, added to the covariance matrices of the local Gaussian for numerical stability.randrepeats
specifies how many times to repeat running thefpw
algorithm in order to take into account the random initialisation offpw
each time it is run.
Returns
cvresults
is a matrix of dimensions (number of radii candidates)×(number of folds) of log-likelihoods evaluated on left out folds.
Example
julia> using Statistics, PyPlot
julia> X = spiraldata(300)
julia> r_range = LinRange(0.01, 2.0, 100)
julia> cvresults = cvfpw(X, r_range)
julia> r_perf = mean(cvresults, dims=2)
julia> best_index = argmax(r_perf)
julia> r_best = r_range[best_index]
julia> mix = fpw(X, r_best)
julia> x = rand(mix, 1000)'
julia> plot(X[:,1], X[:,2], "bo", label="data")
julia> plot(x[:,1], x[:,2], ".r", label="generated", alpha=0.7)
julia> legend()
FastParzenWindows.fpw
— Methodp = fpw(X, r; gamma = 1e-6, seed = 1)
Estimate density through the fast parzen windows density algorithm. The algorithm partitions the data space in hyperdiscs of radius r
. Data items in matrix X
are then 'softly' assigned to the partitions. The local density in each partition is modelled by a Gaussian distribution. The global density estimate is returned as a Gaussian mixture model of type Distributions.MixtureModel
.
The implementation is based on X. Wang, P. Tino, M. A. Fardal, S. Raychaudhury and A. Babul, "Fast parzen window density estimator," 2009 International Joint Conference on Neural Networks, 2009, pp. 3267-3274.
Parameters
X
is a N×D data matrix, i.e. there are N data items of dimension D.r
is a scalar specifying the common radius of the hyperdiscsseed
controls the random number generator that randomly picks data items as hyperdisc centres.gamma
is a scalar that specified a multiple of the identity matrix, i.e. γI, added to the covariance matrices of the local Gaussian for numerical stability.
Returns
p
a Gaussian mixture model as typeDistributions.MixtureModel
Example
julia> X = spiraldata(300)
julia> mix = fpw(X, 0.05)
julia> x = rand(mix, 1000)'
julia> plot(X[:,1], X[:,2], "bo", label="data")
julia> plot(x[:,1], x[:,2], ".r", label="generated", alpha=0.7)
julia> legend()
FastParzenWindows.spiraldata
— FunctionX = spiraldata(N)
Generates data points on a 2D spiral returned as a N×2 matrix X.