# FastRunningMedian.jl

This Julia Package allows you to calculate a running median - fast.

## Getting Started

In Julia, install with:

]add FastRunningMedian

Example Usage:

```
julia> using FastRunningMedian
julia> running_median([1,9,2,3,-9,1], 3)
6-element Vector{Float64}:
1.0
2.0
3.0
2.0
1.0
1.0
```

## High-level API

#
** FastRunningMedian.running_median** —

*Function*.

running_median(input, window_length, tapering=:symmetric; kwargs...) -> output

Run a median filter of `window_length`

over the input array and return the result.

If the input array is multidimensional, the median will be run only over the first dimension, i.e. over all columns indepedently.

**Taperings**

The tapering decides the behaviour at the ends of the input. The available taperings are:

`:symmteric`

or`:sym`

: Ensure that the window is symmetric around each point of the output array by always growing or shrinking the window by 2. The output has the same length as the input if`window_length`

is odd. If`window_length`

is even, the output has one element less.`:asymmetric`

or`:asym`

: Always adds or removes one element when calculating the next output value. Creates asymmetric windowing at the edges of the array. If the input is N long, the output is N+window_length-1 elements long.`:asymmetric_truncated`

or`:asym_trunc`

: The same as asymmetric, but truncated at beginning and end to match the length of`:symmetric`

.`:none`

or`:no`

: No tapering towards the ends. If the input has N elements, the output is only N-window_length+1 long. Equivalent to "roll" from RollingFunctions.`:beginning_only`

or`:start`

: At the beginning, always grow the window by one but do not taper the end. This is equivalent to asymmetric but truncated at the end such that the output length matches the input length. Equivalent to "run" from RollingFunctions.

If you choose an even `window_length`

, the elements of the output array lie in the middle between the input elements on a continuous underlying axis.

With the exception of `:beginning_only`

, all taperings are mirror symmetric with respect to the middle of the input array.

**Keyword Arguments**

`nan=:include`

: By default, NaN values in the window will turn the median NaN as well. Use`nan = :ignore`

to ignore NaN values and calculate the median over the remaining values in the window. If there are only NaNs in the window, the median will be NaN regardless.`output_eltype=Float64`

: Element type of the output array. The output element type should allow converting from Float64 and the input element type. The exception is odd window lengths with taperings`:no`

or`:sym`

, in which case the output element type only has to allow converting from the input element type.

**Performance**

The underlying algorithm should scale as O(N log w) with the input length N and the window_length w.

## Taperings Visualized

Each data point is shown as a cross and the windows are visualized as colored boxes, the input is grey.

## Performance Comparison

For large window lengths, this package performs even better than calling `runmed`

in R, which uses the Turlach implementation written in C. For small window lengths, the Stuetzle implementation in R still outperforms this package, but the overhead from RCall doesn't seem worth it. Development of a fast implementation for small window lengths is ongoing, see the corresponding issues for details.

In contrast to this package, SortFilters.jl supports arbitrary probability levels, for example to calculate quantiles.

You can find the Notebook used to create the above graph in the `benchmark`

folder. I ran it on an i7-2600K with 8 GB RAM while editing and browsing in the background.

## Mid-level API

You can take control of allocating the output vector and median filter with a lower-level API. This is useful when you have to calculate many running medians of the same window length.

#
** FastRunningMedian.running_median!** —

*Function*.

running_median!(mf::MedianFilter, output, input, tapering=:sym; nan=:include) -> output

Use `mf`

to calculate the running median of `input`

and write the result to `output`

.

For all details, see `running_median`

.

**Examples**

input = [4 5 6;
1 0 9;
9 8 7;
3 1 2;]
output = similar(input, (4,3))
mf = MedianFilter(eltype(input), 3)
for j in axes(input, 2) # run median over each column
# re-use mf in every iteration
running_median!(mf, @view(output[:,j]), input[:,j])
end
output
# output
4×3 Matrix{Int64}:
4 5 6
4 5 7
3 1 7
3 1 2

## Stateful API

The stateful API can be used for streaming data, e. g. to reduce RAM consumption, or building your own high-level API.

#
** FastRunningMedian.MedianFilter** —

*Type*.

MedianFilter([T=Float64,] window_length) where T <: Real

Construct a stateful running median filter, taking values of type `T`

.

Manipulate with `grow!`

, `roll!`

, `shrink!`

, `reset!`

. Query with `median`

, `length`

, `window_length`

, `isfull`

.

**Examples**

```
julia> mf = MedianFilter(Int64, 2)
MedianFilter{Int64}(MutableBinaryHeap(), MutableBinaryHeap(), Tuple{FastRunningMedian.ValueLocation, Int64}[], 0, 0)
julia> grow!(mf, 1); median(mf) # window: [1]
1
julia> grow!(mf, 2); median(mf) # window: [1,2]
1.5
julia> roll!(mf, 3); median(mf) # window: [2,3]
2.5
julia> shrink!(mf); median(mf) # window: [3]
3
```

#
** FastRunningMedian.grow!** —

*Function*.

grow!(mf::MedianFilter, val) -> mf

Grow mf with the new value `val`

.

If mf would grow beyond maximum window length, an error is thrown. In this case you probably wanted to use `roll!`

.

The new element is pushed onto the end of the circular buffer.

#
** FastRunningMedian.roll!** —

*Function*.

roll!(mf::MedianFilter, val) -> mf

Roll the window over to the next position by replacing the first and oldest element in the ciruclar buffer with the new value `val`

.

#
** FastRunningMedian.shrink!** —

*Function*.

shrink!(mf::MedianFilter) -> mf

Shrinks `mf`

by removing the first and oldest element in the circular buffer.

Will error if mf contains only one element as a MedianFilter with zero elements would not have a median.

#
** FastRunningMedian.reset!** —

*Function*.

reset!(mf::MedianFilter) -> mf

Reset the median filter `mf`

by emptying it.

#
** FastRunningMedian.median** —

*Function*.

median(mf::MedianFilter; nan=:include)

Determine the current median in `mf`

.

**NaN Handling**

By default, any NaN value in the filter will turn the result NaN.

Use the keyword argument `nan = :ignore`

to ignore NaN values and calculate the median over the remaining values. If there are only NaNs, the median will be NaN regardless.

**Implementation**

If the number of elements in MedianFilter is odd, the low_heap is always one element bigger than the high_heap. The top element of the low_heap then is the median.

If the number of elements in MedianFilter is even, both heaps are the same size and the median is the mean of both top elements.

#
** Base.length** —

*Function*.

length(mf::MedianFilter)

Returns the number of elements in the stateful median filter `mf`

.

This number is equal to the length of the internal circular buffer.

#
** FastRunningMedian.window_length** —

*Function*.

window_length(mf::MedianFilter)

Returns the window_length of the stateful median filter `mf`

.

This number is equal to the capacity of the internal circular buffer.

#
** FastRunningMedian.isfull** —

*Function*.

isfull(mf::MedianFilter)

Returns true when the length of the stateful median filter `mf`

equals its window length.

## Sources

W. Hardle, W. Steiger 1995: Optimal Median Smoothing. Published in Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 44, No. 2 (1995), pp. 258-264. https://doi.org/10.2307/2986349

(I did not implement their custom double heap, but used two heaps from DataStructures.jl)

## Keywords

Running Median is also known as Rolling Median or Moving Median.