Weight Vectors

In statistical applications, it is not uncommon to assign weights to samples. To facilitate the use of weight vectors, we introduce the abstract type AbstractWeights for the purpose of representing weight vectors, which has two advantages:

A different type AbstractWeights distinguishes the role of the weight vector from other data vectors in the input arguments.
Statistical functions that utilize weights often need the sum of weights for various purposes. The weight vector maintains the sum of weights, so that it needn't be computed repeatedly each time the sum of weights is needed.

Note

The weight vector is a light-weight wrapper of the input vector. The input vector is NOT copied during construction.
The weight vector maintains the sum of weights, which is computed upon construction. If the value of the sum is pre-computed, one can supply it as the second argument to the constructor and save the time of computing the sum again.

Implementations

Several statistical weight types are provided which subtype AbstractWeights. The choice of weights impacts how bias is corrected in several methods. See the var, std and cov docstrings for more details.

`AnalyticWeights`

Analytic weights describe a non-random relative importance (usually between 0 and 1) for each observation. These weights may also be referred to as reliability weights, precision weights or inverse variance weights. These are typically used when the observations being weighted are aggregate values (e.g., averages) with differing variances.

w = AnalyticWeights([0.2, 0.1, 0.3])
w = aweights([0.2, 0.1, 0.3])

`FrequencyWeights`

Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

w = FrequencyWeights([2, 1, 3])
w = fweights([2, 1, 3])

`ProbabilityWeights`

Probability weights represent the inverse of the sampling probability for each observation, providing a correction mechanism for under- or over-sampling certain population groups. These weights may also be referred to as sampling weights.

w = ProbabilityWeights([0.2, 0.1, 0.3])
w = pweights([0.2, 0.1, 0.3])

`UnitWeights`

Unit weights are a special case in which all observations are given a weight equal to 1. Using such weights is equivalent to computing unweighted statistics.

This type can notably be used when implementing an algorithm so that a only a weighted variant has to be written. The unweighted variant is then obtained by passing a UnitWeights object. This is very efficient since no weights vector is actually allocated.

w = uweights(3)
w = uweights(Float64, 3)

`Weights`

The Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights and ProbabilityWeights.

w = Weights([1., 2., 3.])
w = weights([1., 2., 3.])

Exponential weights: `eweights`

Exponential weights are a common form of temporal weights which assign exponentially decreasing weights to past observations.

If t is a vector of temporal indices then for each index i we compute the weight as:

$λ (1 - λ)^{1 - i}$

$λ$ is a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

For example, the following call generates exponential weights for ten observations with $λ = 0.3$.

julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

Simply passing the number of observations n is equivalent to passing in 1:n.

julia> eweights(10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

Finally, you can construct exponential weights from an arbitrary subset of timestamps within a larger range.

julia> t
2019-01-01T01:00:00:2 hours:2019-01-01T05:00:00

julia> r
2019-01-01T01:00:00:1 hour:2019-01-02T01:00:00

julia> eweights(t, r, 0.3)
3-element Weights{Float64,Float64,Array{Float64,1}}:
 0.3
 0.6122448979591837
 1.249479383590171

NOTE: This is equivalent to eweights(something.(indexin(t, r)), 0.3), which is saying that for each value in t return the corresponding index for that value in r. Since indexin returns nothing if there is no corresponding value from t in r we use something to eliminate that possibility.

Methods

AbstractWeights implements the following methods:

eltype
length
isempty
values
sum

The following constructors are provided:

StatsBase.AnalyticWeights — Type

AnalyticWeights(vs, wsum=sum(vs))

Construct an AnalyticWeights vector with weight values vs. A precomputed sum may be provided as wsum.

StatsBase.FrequencyWeights — Type

FrequencyWeights(vs, wsum=sum(vs))

Construct a FrequencyWeights vector with weight values vs. A precomputed sum may be provided as wsum.

Frequency weights describe the number of times (or frequency) each observation was observed. These weights may also be referred to as case weights or repeat weights.

StatsBase.ProbabilityWeights — Type

ProbabilityWeights(vs, wsum=sum(vs))

Construct a ProbabilityWeights vector with weight values vs. A precomputed sum may be provided as wsum.

StatsBase.Weights — Type

Weights(vs, wsum=sum(vs))

Construct a Weights vector with weight values vs. A precomputed sum may be provided as wsum.

The Weights type describes a generic weights vector which does not support all operations possible for FrequencyWeights, AnalyticWeights and ProbabilityWeights.

StatsBase.aweights — Function

aweights(vs)

Construct an AnalyticWeights vector from array vs. See the documentation for AnalyticWeights for more details.

StatsBase.fweights — Function

fweights(vs)

Construct a FrequencyWeights vector from a given array. See the documentation for FrequencyWeights for more details.

StatsBase.pweights — Function

pweights(vs)

Construct a ProbabilityWeights vector from a given array. See the documentation for ProbabilityWeights for more details.

StatsBase.eweights — Function

eweights(t::AbstractVector{<:Integer}, λ::Real)
eweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real) where T
eweights(n::Integer, λ::Real)

Construct a Weights vector which assigns exponentially decreasing weights to past observations, which in this case corresponds to larger integer values i in t. If an integer n is provided, weights are generated for values from 1 to n (equivalent to t = 1:n).

For each element i in t the weight value is computed as:

$λ (1 - λ)^{1 - i}$

Arguments

t::AbstractVector: temporal indices or timestamps
r::StepRange: a larger range to use when constructing weights from a subset of timestamps
n::Integer: if provided instead of t, temporal indices are taken to be 1:n
λ::Real: a smoothing factor or rate parameter such that $0 < λ ≤ 1$. As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

Examples

julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

StatsBase.weights — Function

weights(vs)

Construct a Weights vector from array vs. See the documentation for Weights for more details.

weights(obj::StatisticalModel)

Return the weights used in the model.