_bin_distribution!(c::Vector{Float64}, D::Matrix{Float64}, m::Float64)::Vector{Float64}

In-place allocation of the distribution binning. This function is the one that is used internally to over-write the scores.

_bin_distribution(D::Matrix{Float64}, m::Float64)::Vector{Float64}

Bin a distance matrix, using a default count of 20 bins. This function is instrumental in the package, as it is used internally to calculate the divergence between the observed and simulated distances distributions. This specific implementation had the least-worst performance during a series of benchmarks, but in practice the package is going to spend a lot of time running it. It is a prime candidate for optimisation.

_distance_between_binned_distributions(p, q)

Returns the Jensen-Shannon distance (i.e. the square root of the JS divergence) for the two distance matrices. This version is preferred to the KL divergence in the original implementation as it prevents the Inf values when p(x)=0 and q(x)>0. The JS divergences is bounded between 0 and the natural log of 2, which gives an absolute measure of fit allowing to compare the solutions. Note that the value returned is already corrected, so it can be at most 1.0, and at best (identical matrices) 0.

_generate_new_random_point(layer, points, distances)

Generates a new random point (that must fall within a valued cell of layer) based on a collection of points and a Dxy distance matrix. The algorithm works by sampling a point, a distance in the matrix, and then generates a new point through a call to _random_point. Note that the distance is multiplied by the square root of a random deviate within the unit interval, in order to have points that fall uniformly within the circle defined by the sampled distance. In the absence of this correction, the distribution of points is biased towards the center.

_random_point(ref, d; R=Fauxcurrences._earth_radius)

This solves the direct (first) geodetic problem assuming Haversine distances are a correct approximation of the distance between points.

bootstrap!(sim, layer, obs, obs_intra, obs_inter, sim_intra, sim_inter)

Generates the initial proposition for points - this function generates the points for all taxa at once, so some knowledge of the distance matrices is required. Note that this function is modifying the bootstrapped object, in order to make be as efficient as possible.

Specifically, the first point for each taxa is picked to respect the maximal inter-specific distances, and then the following points are picked to respect the intra and inter-specific distances. Points after the first one are added at random, so there can be an accumulation of points in some species early on.

Note that this function is not particularly efficient, but this is a little bit of over-head for every simulation. The only guarantee offered is that the distances are not above the maximal distances in the dataset, there is no reason to expect that the distribution of distances within or across taxa will be respected.


The intra and inter components have the same weight, which means the inter-specific matrices can have less cumulative weight

get_valid_coordinates(observations::GBIFRecords, layer::T) where {T <: SimpleSDMLayer}

Get the coordinates for a list of observations, filtering the ones that do not correspond to valid layer positions. Valid layer positions are defined as falling within a valued pixel from the layer.

measure_interspecific_distances!(inter, obs; updated=1:length(obs))

Updates the matrices for interspecific distances; note that internally, the updated keyword argument is going to change, to only replace what needs to be replaced.

measure_intraspecific_distances!(intra, obs; updated=1:length(obs))

Updates the matrices for intraspecific distances; note that internally, the updated keyword argument is going to change, to only replace what needs to be replaced.

preallocate_simulated_points(obs; samples=size.(obs, 2))

Create an empty matrix given a series of observations, and a number of samples to keep in the simulated dataset for each series of observations.

score_distributions(W, bin_intra, bin_s_intra, bin_inter, bin_s_inter)

Performs the actual score of the distributions, based on the weight matrix.

weighted_components(n, intra)

The intra-specific component has relative weight intra – for a value of 1.0, the model is a purely intra-specific one