Docstrings · DataWrangler.jl

DataWrangler.boxcox — Method

Package: DataWrangler

boxcox(x::Vector)
boxcox(x::Vector, λ::Float64)  
boxcox(x::Vector, λ::Vector)

Compute a Box-Cox power transformation given λ or (λ1,λ2) for data containing negative values, or compute an optimal power transformation if no λ or (λ1,λ2) is provided.

\[x(\lambda) = \begin{cases} \dfrac{x_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \ln x_i & \text{if } \lambda = 0 \end{cases}\]

for negative values

\[x(\boldsymbol{\lambda}) = \begin{cases} \dfrac{(x_i + \lambda_2)^{\lambda_1} - 1}{\lambda_1} & \text{if } \lambda_1 \neq 0 \\ \ln (x_i + \lambda_2) & \text{if } \lambda_1 = 0 \end{cases} \]

Arguments

x: Vector to be transformed.
λ: Exponent/s for the tranformation

Returns

A vector with a boxcox tarnsformation for x or a Dict with :x boxcox tranformed and the optimal :λ

Reference

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. A

Examples

julia> x = rand(100)
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x

julia> x = rand(100) .- 0.5
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x

DataWrangler.d — Method

Package: DataWrangler

function d(x::{AbstractVector, AbstractArray},
           or::Int=1,
           la::Int=1;
           center::Bool=false)

Return Lagged differences of a given Vector or Array.

Arguments

x: Vector or Array of data.
or: Order of the differences; number of recursive iterations on the same vector/array.
la: Lag for the difference.
center: Center the result in the response using Missing values.

Returns

Laged differences Vector or Array of a given order.

Examples

julia> x = [1,2,3,4,5];
julia> d(x)
4-element Vector{Int64}:
 1
 1
 1
 1

julia> d(x,2)
3-element Vector{Int64}:
 0
 0
 0

julia> d(x,1,2)
3-element Vector{Int64}:
 2
 2
 2

julia> x = reshape(collect(1:20),10,2);

julia> d(x,2,2)
6×2 Matrix{Int64}:
 0  0
 0  0
 0  0
 0  0
 0  0
 0  0

julia> d(d(x,1,2),1,2) == d(x,2,2)
true

DataWrangler.iboxcox — Method

Package: Forecast

iboxcox(x::Vector, λ::Float64) 
iboxcox(x::Vector, λ::Vector)

Compute the inverse transformation of a Box-Cox power transformation given λ.

Arguments

x: Vector with a boxcox tranformation to be inverted.
λ: Exponent for the inverse tranformation.

Returns

A vector with witht the inverse transformation of x given λ.

Reference

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. A

Examples

julia> x = rand(100)
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x

julia> x = rand(100) .- 0.5
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x

DataWrangler.impute! — Method

Package: DataWrangler

impute[!]([x], y; method = "loess", q)

Impute missing values in vector y either in-place or returning a copy of y with the imputed values.

Parameters

x: Optional vector containing the support for y (no missing value allowed)
y: Vector of type Real with missing values to be imputed
method: This parameter can take three valid values and defaults to "loess":
- "loess": Runs loess with a window size of q on the dataset and interpolate/extrapolate results on the missing values.
- "normal": Random imputation using a Normal empirical distribution based on the size of q.
- "uniform": Random imputation using an Uniform empirical distribution based on the size of q.
q: Number of closest vector values to the imputation to be considered, it defaults to 3*length(y)÷4).

Description

The imputation replaces missing values either with loess, local random uniform or local random normal methods. When the vector's type is of Integers then a rounding is performed on the results.

Returns

Nothing if the imputation is done in-place or the original vector y with all the missing values imputed.

Examples

x = sort(rand(100));
y = Array{Union{Missing,Float64}}(undef,100);
y[:] = rand(100);
y[rand(1:100,10)] .= missing;

impute!(x,y)

DataWrangler.impute — Method

Package: DataWrangler

Check 'impute! for further information.

DataWrangler.normality — Method

Jarque–Bera Statistic as a normality measure for optimization

DataWrangler.normalize! — Method

Package: DataWrangler

normalize[!]([x]; method)

Normalize values in vector x either in-place or returning a copy of x with the imputed values

Parameters

x: Vector of type Real with missing values to be imputed

method: There are four valid values: "z-score", "min-max", "softmax", "sigmoid"

Description

The normalization is applied ignoring any missing values in the Array, and it takes place in all the available dimensions of the array. If normalization is required in just some specific dimensions the function mapslices can be used to select those dimensions.

Example

x = [1.,2,3,4,5]
normalize!(x; method = "min-max")

println(x)
[0.0, 0.25, 0.5, 0.75, 1.0]

DataWrangler.normalize — Method

Package: DataWrangler

Check 'normalize! for further information.

DataWrangler.outlie — Method

Package: DataWrangler

outlie[!]([x], y, σ = 2; q)

Replace or detect outlier values in vector x describing a time series

Parameters

x: Optional vector containing the support for y (no missing value allowed)
y: Vector with outliers to be dealt with
σ: Number of sigmas used to identify outliers, it defaults to four.
q: Number of closest vector values to the imputation to be considered, it defaults to 3*length(y)÷4).

Description

When using outlie! the function replaces outliers with a missing value and return the index of the values replaced, if outlie is used isntead only the outliers index poistion are returned.

Outliers are indentified by measureing its distance in number of sigmas estimated from a loess fitting with window q. When datasets with noise close to zero all outliers will still be detected but non-outliers might be indentified as such, a way to prevent non-outliers to be indentified in this situtaion is by introducing a small amount of noise ϵ in y, e.g. outlie!(x,y+ϵ).

Examples

n = 1000
x = sort(rand(n))*2*pi;
y = Array{Union{Missing,Float64}}(undef,n);
y[:] = sin.(x).+randn(n)/10
mid = vcat(100,300,600,950);
y[mid] .= x[mid] .+ 2*(randn(length(mid)).+1)
y[500] = x[500] - 2*(randn(1)[1]+1)

outlie(x,y)

julia> outlie(x,y)
5-element Vector{Int64}:
 100
 300
 500
 600
 950

DataWrangler.p — Method

Package: Forecast

function p(dx, x0)

Return reverse lagged differences of a given order for Vector, Array.

Arguments

dx: Array or DataFrame of data.
x0: Initial constants the reverse difference. The default value represents an integration of order one and lag one with initial values at zero. The format for the initial values is Array{Real,3}(order, variable, lag)"

Returns

Lagged differences Vector or Array of a given order.

Examples


# Order two with Lag two
julia> x = repeat(1:2,30);
julia> dx = d(x,2,2);
julia> x0 = zeros(2,1,2); # lag 2, 1 variable, order 1
julia> x0[1,:,:] = collect(1:2);
julia> p(dx,x0) ≈ x
true

# Calculation of π
julia> x = 0:0.001:1;
julia> y = sqrt.(1 .- x.^2);
julia> isapprox(4*p(y)[end]/1000 , π, atol = 0.01)
true