Star us on GitHub!
StarHome
OnlineStats is a Julia package for statistical analysis using (parallelizable) online algorithms. Online algorithms are well suited for streaming data or when data is too large to hold in memory. Observations are processed one at a time and all algorithms use O(1) memory.
Installation
import Pkg
Pkg.add("OnlineStats")
Basics
Every stat is <: OnlineStat{T}
(where T
is the type of a single observation)
julia> using OnlineStats
julia> m = Mean()
Mean: n=0 | value=0.0
julia> supertype(Mean)
OnlineStat{Number}
Stats can be updated
fit!
can be used to update the stat with a single observation or multiple observations:
fit!(stat::OnlineStat{T}, y::S)
will iterate through y
and fit!
each element if T != S
.
julia> y = randn(100);
julia> fit!(m, y)
Mean: n=100 | value=-0.0376621
Stats can be merged
julia> y2 = randn(100);
julia> m2 = fit!(Mean(), y2)
Mean: n=100 | value=-0.0151381
julia> merge!(m, m2)
Mean: n=200 | value=-0.0264001
Stats have a value
julia> value(m)
-0.026400093867559392
Collections of Stats
Series
A Series
tracks stats that should be applied to the same data stream.
y = rand(1000)
s = Series(Mean(), Variance())
fit!(s, y)
Series ├─ Mean: n=1000 | value=0.501694 └─ Variance: n=1000 | value=0.079299
FTSeries
An FTSeries
tracks stats that should be applied to the same data stream, but filters and transforms (hence FT
) the input data before it is sent to its stats.
s = FTSeries(Mean(), Variance(); filter = x->true, transform = abs)
fit!(s, -y)
FTSeries ├─ Mean: n=1000 | value=0.501694 └─ Variance: n=1000 | value=0.079299
Group
A Group
tracks stats that should be applied to different data streams.
g = Group(Mean(), CountMap(Bool))
itr = zip(randn(100), rand(Bool, 100))
fit!(g, itr)
Group ├─ Mean: n=100 | value=0.101019 └─ CountMap: n=100 | value=OrderedCollections.OrderedDict{Bool,Int64}(0=>54,1=>46)