Star us on GitHub!
StarHome
OnlineStats is a Julia package for statistical analysis with algorithms that run both online and in parallel. Online algorithms are well suited for streaming data or when data is too large to hold in memory. Observations are processed one at a time and all algorithms use O(1) memory.
Installation
import Pkg
Pkg.add("OnlineStats")
Basics
Every stat is <: OnlineStat{T}
(where T
is the type of a single observation)
julia> using OnlineStats
julia> m = Mean()
Mean: n=0 | value=0.0
julia> supertype(Mean)
OnlineStat{Number}
Stats can be updated
fit!
can be used to update the stat with a single observation or multiple observations: fit!(stat::OnlineStat{T}, y::S)
will iterate through y
and fit!
each element if T != S
.
julia> y = randn(100);
julia> fit!(m, y)
Mean: n=100 | value=-0.0334456
Stats can be merged
julia> y2 = randn(100);
julia> m2 = fit!(Mean(), y2)
Mean: n=100 | value=-0.0644987
julia> merge!(m, m2)
Mean: n=200 | value=-0.0489722
Stats have a value
julia> value(m)
-0.04897216008106918
Collections of Stats
Series
A Series
tracks stats that should be applied to the same data stream.
y = rand(1000)
s = Series(Mean(), Variance())
fit!(s, y)
Series
├─ Mean: n=1000 | value=0.495432
└─ Variance: n=1000 | value=0.0828131
FTSeries
An FTSeries
tracks stats that should be applied to the same data stream, but filters and transforms (hence FT
) the input data before it is sent to its stats.
s = FTSeries(Mean(), Variance(); filter = x->true, transform = abs)
fit!(s, -y)
FTSeries
├─ Mean: n=1000 | value=0.495432
└─ Variance: n=1000 | value=0.0828131
Group
A Group
tracks stats that should be applied to different data streams.
g = Group(Mean(), CountMap(Bool))
itr = zip(randn(100), rand(Bool, 100))
fit!(g, itr)
Group
├─ Mean: n=100 | value=0.00942837
└─ CountMap: n=100 | value=OrderedCollections.OrderedDict{Bool,Int64}(1=>48,0=>52)