DataWrangler.boxcox
— MethodPackage: DataWrangler
boxcox(x::Vector)
boxcox(x::Vector, λ::Float64)
boxcox(x::Vector, λ::Vector)
Compute a Box-Cox power transformation given λ or (λ1,λ2) for data containing negative values, or compute an optimal power transformation if no λ or (λ1,λ2) is provided.
\[x(\lambda) = \begin{cases} \dfrac{x_i^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \ln x_i & \text{if } \lambda = 0 \end{cases}\]
for negative values
\[x(\boldsymbol{\lambda}) = \begin{cases} \dfrac{(x_i + \lambda_2)^{\lambda_1} - 1}{\lambda_1} & \text{if } \lambda_1 \neq 0 \\ \ln (x_i + \lambda_2) & \text{if } \lambda_1 = 0 \end{cases} \]
Arguments
x
: Vector to be transformed.λ
: Exponent/s for the tranformation
Returns
A vector with a boxcox tarnsformation for x
or a Dict with :x boxcox tranformed and the optimal :λ
Reference
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. A
Examples
julia> x = rand(100)
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x
julia> x = rand(100) .- 0.5
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x
DataWrangler.d
— MethodPackage: DataWrangler
function d(x::{AbstractVector, AbstractArray},
or::Int=1,
la::Int=1;
center::Bool=false)
Return Lagged differences of a given Vector or Array.
Arguments
x
: Vector or Array of data.or
: Order of the differences; number of recursive iterations on the same vector/array.la
: Lag for the difference.center
: Center the result in the response using Missing values.
Returns
Laged differences Vector or Array of a given order.
Examples
julia> x = [1,2,3,4,5];
julia> d(x)
4-element Vector{Int64}:
1
1
1
1
julia> d(x,2)
3-element Vector{Int64}:
0
0
0
julia> d(x,1,2)
3-element Vector{Int64}:
2
2
2
julia> x = reshape(collect(1:20),10,2);
julia> d(x,2,2)
6×2 Matrix{Int64}:
0 0
0 0
0 0
0 0
0 0
0 0
julia> d(d(x,1,2),1,2) == d(x,2,2)
true
DataWrangler.iboxcox
— MethodPackage: Forecast
iboxcox(x::Vector, λ::Float64)
iboxcox(x::Vector, λ::Vector)
Compute the inverse transformation of a Box-Cox power transformation given λ.
Arguments
x
: Vector with a boxcox tranformation to be inverted.λ
: Exponent for the inverse tranformation.
Returns
A vector with witht the inverse transformation of x given λ.
Reference
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. A
Examples
julia> x = rand(100)
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x
julia> x = rand(100) .- 0.5
julia> bc = boxcox(x)
julia> iboxcox(bc[:x],bc[:λ]) ≈ x
DataWrangler.impute!
— MethodPackage: DataWrangler
impute[!]([x], y; method = "loess", q)
Impute missing values in vector y
either in-place or returning a copy of y
with the imputed values.
Parameters
x
: Optional vector containing the support fory
(no missing value allowed)y
: Vector of type Real with missing values to be imputedmethod
: This parameter can take three valid values and defaults to "loess":- "loess": Runs loess with a window size of
q
on the dataset and interpolate/extrapolate results on themissing
values. - "normal": Random imputation using a Normal empirical distribution based on the size of
q
. - "uniform": Random imputation using an Uniform empirical distribution based on the size of
q
.
- "loess": Runs loess with a window size of
q
: Number of closest vector values to the imputation to be considered, it defaults to 3*length(y)÷4).
Description
The imputation replaces missing values either with loess, local random uniform or local random normal methods. When the vector's type is of Integers then a rounding is performed on the results.
Returns
Nothing if the imputation is done in-place or the original vector y
with all the missing values imputed.
Examples
x = sort(rand(100));
y = Array{Union{Missing,Float64}}(undef,100);
y[:] = rand(100);
y[rand(1:100,10)] .= missing;
impute!(x,y)
DataWrangler.impute
— MethodPackage: DataWrangler
Check 'impute! for further information.
DataWrangler.normality
— MethodJarque–Bera Statistic as a normality measure for optimization
DataWrangler.normalize!
— MethodPackage: DataWrangler
normalize[!]([x]; method)
Normalize values in vector x
either in-place or returning a copy of x
with the imputed values
Parameters
x
: Vector of type Real with missing values to be imputed
method
: There are four valid values: "z-score", "min-max", "softmax", "sigmoid"
Description
The normalization is applied ignoring any missing values in the Array, and it takes place in all the available dimensions of the array. If normalization is required in just some specific dimensions the function mapslices
can be used to select those dimensions.
Example
x = [1.,2,3,4,5]
normalize!(x; method = "min-max")
println(x)
[0.0, 0.25, 0.5, 0.75, 1.0]
DataWrangler.normalize
— MethodPackage: DataWrangler
Check 'normalize! for further information.
DataWrangler.outlie
— MethodPackage: DataWrangler
outlie[!]([x], y, σ = 2; q)
Replace or detect outlier values in vector x
describing a time series
Parameters
x
: Optional vector containing the support fory
(no missing value allowed)y
: Vector with outliers to be dealt withσ
: Number of sigmas used to identify outliers, it defaults to four.q
: Number of closest vector values to the imputation to be considered, it defaults to 3*length(y)÷4).
Description
When using outlie!
the function replaces outliers with a missing
value and return the index of the values replaced, if outlie
is used isntead only the outliers index poistion are returned.
Outliers are indentified by measureing its distance in number of sigmas estimated from a loess fitting with window q
. When datasets with noise close to zero all outliers will still be detected but non-outliers might be indentified as such, a way to prevent non-outliers to be indentified in this situtaion is by introducing a small amount of noise ϵ in y
, e.g. outlie!(x,y+ϵ)
.
Examples
n = 1000
x = sort(rand(n))*2*pi;
y = Array{Union{Missing,Float64}}(undef,n);
y[:] = sin.(x).+randn(n)/10
mid = vcat(100,300,600,950);
y[mid] .= x[mid] .+ 2*(randn(length(mid)).+1)
y[500] = x[500] - 2*(randn(1)[1]+1)
outlie(x,y)
julia> outlie(x,y)
5-element Vector{Int64}:
100
300
500
600
950
DataWrangler.p
— MethodPackage: Forecast
function p(dx, x0)
Return reverse lagged differences of a given order for Vector, Array.
Arguments
dx
: Array or DataFrame of data.x0
: Initial constants the reverse difference. The default value represents an integration of order one and lag one with initial values at zero. The format for the initial values is Array{Real,3}(order, variable, lag)"
Returns
Lagged differences Vector or Array of a given order.
Examples
# Order two with Lag two
julia> x = repeat(1:2,30);
julia> dx = d(x,2,2);
julia> x0 = zeros(2,1,2); # lag 2, 1 variable, order 1
julia> x0[1,:,:] = collect(1:2);
julia> p(dx,x0) ≈ x
true
# Calculation of π
julia> x = 0:0.001:1;
julia> y = sqrt.(1 .- x.^2);
julia> isapprox(4*p(y)[end]/1000 , π, atol = 0.01)
true