FastParzenWindows.cvfpwMethod

cvresults = cvfpw(X, r_range; numfolds = 10, seed = 1, gamma = 1e-6, randrepeats = 7)

Performs cross validation for the radius parameter for partitioning the data in hyperdiscs. Candidate values for the radius parameter are specified in r_range. Seed controls the generation of the folds and is also the random seed of fpw. Returns a matrix of dimensions (number of radii candidates)×(number of folds) of log-likelihoods evaluated on left out folds.

Parameters

  • X is a N×D data matrix, i.e. there are N data items of dimension D.
  • r_range is an array or range of scalars specifying the candidate radii.
  • numfolds specifies the number of folds in the cross-validation.
  • seed controls the generation of the folds and is also the random seed offpw
  • gamma is a scalar that specified a multiple of the identity matrix, i.e. γI, added to the covariance matrices of the local Gaussian for numerical stability.
  • randrepeats specifies how many times to repeat running the fpw algorithm in order to take into account the random initialisation of fpw each time it is run.

Returns

  • cvresults is a matrix of dimensions (number of radii candidates)×(number of folds) of log-likelihoods evaluated on left out folds.

Example

julia> using Statistics, PyPlot
julia> X = spiraldata(300)
julia> r_range = LinRange(0.01, 2.0, 100)
julia> cvresults =  cvfpw(X, r_range)
julia> r_perf = mean(cvresults, dims=2)
julia> best_index = argmax(r_perf)
julia> r_best = r_range[best_index]
julia> mix = fpw(X, r_best)
julia> x = rand(mix, 1000)'
julia> plot(X[:,1], X[:,2], "bo", label="data")
julia> plot(x[:,1], x[:,2], ".r", label="generated", alpha=0.7)
julia> legend()
FastParzenWindows.fpwMethod

p = fpw(X, r; gamma = 1e-6, seed = 1)

Estimate density through the fast parzen windows density algorithm. The algorithm partitions the data space in hyperdiscs of radius r. Data items in matrix X are then 'softly' assigned to the partitions. The local density in each partition is modelled by a Gaussian distribution. The global density estimate is returned as a Gaussian mixture model of type Distributions.MixtureModel.

The implementation is based on X. Wang, P. Tino, M. A. Fardal, S. Raychaudhury and A. Babul, "Fast parzen window density estimator," 2009 International Joint Conference on Neural Networks, 2009, pp. 3267-3274.

Parameters

  • X is a N×D data matrix, i.e. there are N data items of dimension D.
  • r is a scalar specifying the common radius of the hyperdiscs
  • seed controls the random number generator that randomly picks data items as hyperdisc centres.
  • gamma is a scalar that specified a multiple of the identity matrix, i.e. γI, added to the covariance matrices of the local Gaussian for numerical stability.

Returns

  • p a Gaussian mixture model as type Distributions.MixtureModel

Example

julia> X = spiraldata(300)
julia> mix = fpw(X, 0.05)
julia> x = rand(mix, 1000)'
julia> plot(X[:,1], X[:,2], "bo", label="data")
julia> plot(x[:,1], x[:,2], ".r", label="generated", alpha=0.7)
julia> legend()