Star us on GitHub!
StarHome
OnlineStats is a Julia package for statistical analysis using (parallelizable) online algorithms. Online algorithms are well suited for streaming data or when data is too large to hold in memory. Observations are processed one at a time and all algorithms use O(1) memory.
Installation
import Pkg
Pkg.add("OnlineStats")
Basics
Every stat is <: OnlineStat{T}
(where T
is the type of a single observation)
julia> using OnlineStats
julia> m = Mean()
Mean: n=0 | value=0.0
julia> supertype(Mean)
OnlineStat{Number}
Stats can be updated
fit!
can be used to update the stat with a single observation or multiple observations:
fit!(stat::OnlineStat{T}, y::S)
will iterate through y
and fit!
each element if T != S
.
julia> y = randn(100);
julia> fit!(m, y)
Mean: n=100 | value=-0.114098
Stats can be merged
julia> y2 = randn(100);
julia> m2 = fit!(Mean(), y2)
Mean: n=100 | value=0.0603272
julia> merge!(m, m2)
Mean: n=200 | value=-0.0268852
Stats have a value
julia> value(m)
-0.02688523969823521
Collections of Stats
Series
A Series
tracks stats that should be applied to the same data stream.
y = rand(1000)
s = Series(Mean(), Variance())
fit!(s, y)
Series
├─ Mean: n=1000 | value=0.50366
└─ Variance: n=1000 | value=0.0834158
FTSeries
An FTSeries
tracks stats that should be applied to the same data stream, but filters and transforms (hence FT
) the input data before it is sent to its stats.
s = FTSeries(Mean(), Variance(); filter = x->true, transform = abs)
fit!(s, -y)
FTSeries
├─ Mean: n=1000 | value=0.50366
└─ Variance: n=1000 | value=0.0834158
Group
A Group
tracks stats that should be applied to different data streams.
g = Group(Mean(), CountMap(Bool))
itr = zip(randn(100), rand(Bool, 100))
fit!(g, itr)
Group
├─ Mean: n=100 | value=-0.0745927
└─ CountMap: n=100 | value=OrderedCollections.OrderedDict{Bool,Int64}(1=>51,0=>49)