Docstrings · AnomalyBenchmark.jl

AnomalyBenchmark.AnomalyBenchmark — Module

AnomalyBenchmark.jl:

Julia implementation of Numenta Anomaly Benchmark for Evaluating Algorithms for Streaming Anomaly Detection

This is a Julia implementation of Numenta's NAB Python package for Anomaly Benchmarking. The code is written from the ground up in Julia following the specifications of NAB.

AnomalyBenchmark.Labeler — Type

Object to get labels and compute the window around each anomaly.

Fields

data::DataFrame : The whole data set with default columns timestamp.

windowSize::Float64 : Estimated size of an anomaly window, as a ratio to the data set length.

probationaryPercent::Float64 : The ratio of probationary period to the data set length.

labels::DataFrame : Ground truth for each record. For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.

labelIndices::AbstractArray{Int, 1} : Indices of the true anomalies in labels

windows::AbstractArray{Tuple{DateTime,DateTime},1} : All the window limits in tuple form: (start time, end time).

Functions

setData::Function : Set the dataset for Labeler.

setLabels::Function : Set the ground true labels from timestamps of true anomalies.

getWindows::Function : Call applyWindows and checkWindows.

applyWindows::Function : This takes all the true anomalies, and adds a standard window. The window length is the class variable windowSize, and the location is centered on the anomaly timestamp.

checkWindows::Function : This takes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.

Constructors

function Labeler(windowSize::Float64, probationaryPercent::Float64)

Arguments

windowSize::Float64 : Estimated size of an anomaly window, as a ratio to the data set length.

probationaryPercent::Float64 : The ratio of probationary period to the data set length.

Examples

Labeler(0.1, 0.15)
AnomalyBenchmark.Labeler(0×0 DataFrames.DataFrame
,0.1,0.15,0×0 DataFrames.DataFrame
,Int64[],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.Scorer — Type

Object to score a data.

Fields

data::DataFrame : The whole data set with default columns timestamp, label, index and alerttype.

probationaryPeriod::Int : Row index after which predictions are scored.

costMatrix::Dict{AbstractString, Float64} : The cost matrix for the profile with the following keys:

True positive (tpWeight): detects the anomaly when the anomaly is present.
False positive (fpWeight): detects the anomaly when the anomaly is absent.
True Negative (tnWeight): does not detect the anomaly when the anomaly is absent.
False Negative (fnWeight): does not detect the anomaly when the anomaly is present.

totalCount::Int : The total count of labels.

counts::Dict{AbstractString, Int} : The counts of tp, fp, tn and fn. Only predictions after probationaryPeriod are counted.

score::Float64 : The score of the anomaly detection algorithm results.

normalizedScore::Float64 : The normalized score of the anomaly detection algorithm such that the maximum possible is 100.0 (i.e. the perfect detector), and a baseline of 0.0 is determined by the "null" detector (which makes no detections).

len::Int : The total count of predictions.

windows::Vector{Window} : The list of windows for the data.

windowLimits::Vector{Tuple{DateTime,DateTime}} : All the window limits in tuple form: (start time, end time).

Functions

getWindows::Function : Create list of windows for the data.

getAlertTypes::Function : For each record, decide whether it is a tp, fp, tn, or fn. Populate counts dictionary with the total number of records in each category.

getScore::Function : Score the entire data and return a single floating point score.

getClosestPrecedingWindow::Function : Given a record index, find the closest preceding window.

normalizeScore::Function : Normalize the detectors' scores according to the baseline defined by the null detector.

Constructor

Scorer(
        timestamps::Vector{DateTime},
        predictions::AbstractVector{<:Integer},
        labels::AbstractVector{<:Integer},
        windowLimits::Vector{Tuple{DateTime,DateTime}},
        costMatrix::Dict{<:AbstractString, Float64},
        probationaryPeriod::Int
    )

Arguments

timestamps::Vector{DateTime} : Timestamps in the data.

predictions::AbstractVector{<:Integer} : Detector predictions of whether each record is anomalous or not. predictions[1:probationaryPeriod-1] are ignored.

labels::AbstractVector{Integer} : Ground truth for each record. For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.

windowLimits::Vector{Tuple{DateTime,DateTime}} : All the window limits in tuple form: (start time, end time).

costMatrix::Dict{AbstractString, Float64} : The cost matrix for the profile with the following keys:

True positive (tpWeight): detects the anomaly when the anomaly is present.
False positive (fpWeight): detects the anomaly when the anomaly is absent.
True Negative (tnWeight): does not detect the anomaly when the anomaly is absent.
False Negative (fnWeight): does not detect the anomaly when the anomaly is present.

probationaryPeriod::Int : Row index after which predictions are scored.

Examples

timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame
│ Row │ timestamp           │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1   │ 2017-01-01T00:00:00 │ 0     │ 1     │ "tn"      │
│ 2   │ 2017-01-02T00:00:00 │ 1     │ 2     │ "tp"      │
│ 3   │ 2017-01-03T00:00:00 │ 0     │ 3     │ "tn"      │
│ 4   │ 2017-01-04T00:00:00 │ 0     │ 4     │ "tn"      │
│ 5   │ 2017-01-05T00:00:00 │ 0     │ 5     │ "fp"      │,1,Dict(:tpWeight=>1.0,:fnWeight=>1.0,:fpWeight=>1.0),5,
Dict{AbstractString,Int64}("tp"=>1,"tn"=>3,"fn"=>0,"fp"=>1),0.0,5,[AnomalyBenchmark.Window(1,2017-01-02T00:00:00,2017-01-03T00:00:00,2×4 DataFrames.DataFrame
│ Row │ timestamp           │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1   │ 2017-01-02T00:00:00 │ 1     │ 2     │ "tp"      │
│ 2   │ 2017-01-03T00:00:00 │ 0     │ 3     │ "tn"      │,[2,3],2,(anonymous function),(anonymous function))],(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.Window — Type

Immutable object to store a single window in a data. Each window represents a range of data points that is centered around a ground truth anomaly label.

Fields

id::Int : The identifier of the Window.

t1::DateTime : The start time of the Window.

t2::DateTime : The end time of the Window.

window::DataFrame : The data within the Window.

indices::AbstractArray : The indices of the Window in the data.

len::Int : The length of the Window.

Functions

repr::Function : String representation of Window. For debugging.

getFirstTruePositive::Function : Get the index of the first true positive within a window.

Constructor

Window(windowId::Int, limits::Tuple{DateTime, DateTime}, data::DataFrame)

Arguments

windowId::Int : An integer id for the Window.

limits::Tuple{DateTime, DateTime} : The start time and end time of the Window.

data::DataFrame : The whole data set with default columns index and timestamp.

Examples

data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
AnomalyBenchmark.Window(1234,2017-01-01T00:00:00,2017-01-02T00:00:00,2×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │,[1,2],2,(anonymous function),(anonymous function))

AnomalyBenchmark.applyWindows — Method

Takes all the true anomalies, as calculated by combineLabels(), and adds a standard window.

Arguments

labeler::Labeler

Examples

labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)

labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.applyWindows()

julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │
│ 3   │ 3     │ 2017-01-03T00:00:00 │
│ 4   │ 4     │ 2017-01-04T00:00:00 │
│ 5   │ 5     │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp           │ label │
├─────┼─────────────────────┼───────┤
│ 1   │ 2017-01-01T00:00:00 │ 0     │
│ 2   │ 2017-01-02T00:00:00 │ 0     │
│ 3   │ 2017-01-03T00:00:00 │ 1     │
│ 4   │ 2017-01-04T00:00:00 │ 0     │
│ 5   │ 2017-01-05T00:00:00 │ 0     │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.checkWindows — Method

Takes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.

Arguments

labeler::Labeler

Examples

labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)

labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.applyWindows()
labeler.checkWindows()

julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │
│ 3   │ 3     │ 2017-01-03T00:00:00 │
│ 4   │ 4     │ 2017-01-04T00:00:00 │
│ 5   │ 5     │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp           │ label │
├─────┼─────────────────────┼───────┤
│ 1   │ 2017-01-01T00:00:00 │ 0     │
│ 2   │ 2017-01-02T00:00:00 │ 0     │
│ 3   │ 2017-01-03T00:00:00 │ 1     │
│ 4   │ 2017-01-04T00:00:00 │ 0     │
│ 5   │ 2017-01-05T00:00:00 │ 0     │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.convertAnomalousWindowsToTimestamps — Method

Returns an array that contains all anomalous timestamps given an array of start time and end time for every anomalous time windows

Arguments

anomalousWindows::AbstractArray{Tuple{DateTime,DateTime},1} : An array of start time and end time for every anomalous time windows.

Returns

Array{DateTime,1} that contains all anomalous timestamps.

Examples

julia> anomalousWindows = [(DateTime(2017, 1, 3, 10, 1), DateTime(2017, 1, 3, 10, 5)), (DateTime(2017, 1, 3, 10, 58), DateTime(2017, 1, 3, 11, 0))]
2-element Array{Tuple{DateTime,DateTime},1}:
 (2017-01-03T10:01:00,2017-01-03T10:05:00)
 (2017-01-03T10:58:00,2017-01-03T11:00:00)
julia> AnomalyBenchmark.convertAnomalousWindowsToTimestamps(anomalousWindows)
8-element Array{DateTime,1}:
 2017-01-03T10:01:00
 2017-01-03T10:02:00
 2017-01-03T10:03:00
 2017-01-03T10:04:00
 2017-01-03T10:05:00
 2017-01-03T10:58:00
 2017-01-03T10:59:00
 2017-01-03T11:00:00

AnomalyBenchmark.convertAnomalyScoresToDetections — Method

Convert anomaly scores (values between 0 and 1) to detections (binary values) given a threshold.

Arguments

anomalyScores::AbstractArray{Float64} : An array of anomaly scores.

threshold::Float64 : The threshold for anomaly scores. If an anomaly score is greater than or equal to the threshold, the detection would be 1; otherwise, the detection would be 0.

Returns

Array{Int64,1} - An array of detections (1 = anomalous, 0 = normal).

Examples

julia> convertAnomalyScoresToDetections([0.3, 0.5, 0.7], 0.6)
3-element Array{Int64,1}:
 0
 0
 1

AnomalyBenchmark.getAlertTypes — Method

Create list of windows for the data

Arguments

scorer::Scorer

limits::Vector{Tuple{DateTime,DateTime}} : All the window limits in tuple form: (start time, end time).

Returns

All the windows for the data of the scorer.

Examples

timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
julia> scorer.getAlertTypes(predictions)
5-element Array{AbstractString,1}:
 "tn"
 "tp"
 "tn"
 "tn"
 "fp"

AnomalyBenchmark.getClosestPrecedingWindow — Method

Given a record index, find the closest preceding window.

Arguments

scorer::Scorer

index::Int : Index of a record.

Returns

Window id for the last window preceding the given index.

Examples

timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)

scorer.getClosestPrecedingWindow(2)
-1

scorer.getClosestPrecedingWindow(4)
1

AnomalyBenchmark.getFirstTruePositive — Method

Get the index of the first true positive within a window.

Arguments

window::Window

Returns

Index of the first true positive within a window. -1 if there are none.

Examples

data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5),
    alerttype = ["fp", "tp", "tp", "fn", "tn"]
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
julia> window.getFirstTruePositive()
2

data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5),
    alerttype = ["fp", "fp", "fp", "fn", "tn"]
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
julia> window.getFirstTruePositive()
-1

AnomalyBenchmark.getProbationPeriod — Method

Return the probationary period index given probation percentage and the length of the file.

Arguments

probationPercent::Float64 : The percentage of predictions that won't be used for scoring.

fileLength::Int : The number of rows of the data file.

Returns

::Int64 If the file length is less than 5000, the probation period would be the probation percentage times the file length; otherwise, it would be the probation percentage times 5,000.

Examples

julia> AnomalyBenchmark.getProbationPeriod(0.2, 4000)
800

julia> AnomalyBenchmark.getProbationPeriod(0.2, 10000)
1000

AnomalyBenchmark.getScore — Method

Score the entire data and return a single floating point score. The position in a given window is calculated as the distance from the end of the window, normalized [-1,0]. I.e. positions -1.0 and 0.0 are at the very front and back of the anomaly window, respectively.

Flat scoring option: If you'd like to run a flat scorer that does not apply the scaled sigmoid weighting, comment out the two scaledSigmoid() lines below, and uncomment the replacement lines to calculate thisTP and thisFP.

Arguments

scorer::Scorer

Returns

Tuple

scores::AbstractVector{Float64} : The score at each timestamp of the data.

scorer.score::Float64 : The score of the anomaly detection algorithm results.

Examples

timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)

scorer.getScore()
([0.0,1.0,0.0,0.0,-0.9999092042625951],9.079573740489177e-5)

AnomalyBenchmark.getWindows — Method

Takes all the true anomalies, as calculated by combineLabels(), and adds a standard window. Takes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.

Arguments

labeler::Labeler

Examples

labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)

labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.getWindows()

julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │
│ 3   │ 3     │ 2017-01-03T00:00:00 │
│ 4   │ 4     │ 2017-01-04T00:00:00 │
│ 5   │ 5     │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp           │ label │
├─────┼─────────────────────┼───────┤
│ 1   │ 2017-01-01T00:00:00 │ 0     │
│ 2   │ 2017-01-02T00:00:00 │ 0     │
│ 3   │ 2017-01-03T00:00:00 │ 1     │
│ 4   │ 2017-01-04T00:00:00 │ 0     │
│ 5   │ 2017-01-05T00:00:00 │ 0     │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.getWindows — Method

Create list of windows for the data

Arguments

scorer::Scorer

limits::Vector{Tuple{DateTime,DateTime}} : All the window limits in tuple form: (start time, end time).

Returns

All the windows for the data of the scorer.

Examples

timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
scorer.getWindows(windowLimits)
1-element Array{AnomalyBenchmark.Window,1}:
 AnomalyBenchmark.Window(1,2017-01-02T00:00:00,2017-01-03T00:00:00,2×4 DataFrames.DataFrame
│ Row │ timestamp           │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1   │ 2017-01-02T00:00:00 │ 1     │ 2     │ "tp"      │
│ 2   │ 2017-01-03T00:00:00 │ 0     │ 3     │ "tn"      │,[2,3],2,(anonymous function),(anonymous function))

AnomalyBenchmark.normalizeScore — Method

Normalize the detectors' scores according to the baseline defined by the null detector, and print to the console. Function can only be called with the scoring step preceding it. The score is normalized by multiplying by 100 and dividing by perfect less the baseline, where the perfect score is the number of TPs possible.

Arguments

scorer::Scorer

timestamps = collect(DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels         = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
                "tpWeight" => 1.0,
                "fnWeight" => 1.0,
                "fpWeight" => 1.0
            )
probationaryPeriod = 1
scorer = AnomalyBenchmark.Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)

julia> scorer.getScore()
([0.0,1.0,0.0,0.0,-0.9999092042625951],9.079573740489177e-5)

julia> scorer.normalizeScore()

Running score normalization step
50.004539786870254

AnomalyBenchmark.scaledSigmoid — Method

Return a scaled sigmoid function given a relative position within a labeled window. The function is computed as follows:

A relative position of -1.0 is the far left edge of the anomaly window and corresponds to S = 2*sigmoid(5) - 1.0 = 0.98661. This is the earliest to be counted as a true positive.

A relative position of -0.5 is halfway into the anomaly window and corresponds to S = 2*sigmoid(0.5*5) - 1.0 = 0.84828.

A relative position of 0.0 consists of the right edge of the window and corresponds to S = 2*sigmoid(0) - 1 = 0.0.

Relative positions > 0 correspond to false positives increasingly far away from the right edge of the window. A relative position of 1.0 is past the right edge of the window and corresponds to a score of 2*sigmoid(-5) - 1.0 = -0.98661.

Arguments

relativePositionInWindow::Float64 : A relative position within a window calculated per the rules above.

Returns

Float64 The scaled sigmoid score.

Examples

julia> AnomalyBenchmark.scaledSigmoid(-1.0)
0.9866142981514305

julia> AnomalyBenchmark.scaledSigmoid(-0.5)
0.8482836399575131

julia> AnomalyBenchmark.scaledSigmoid(0.0)
0.0

julia> AnomalyBenchmark.scaledSigmoid(1.0)
-0.9866142981514303

AnomalyBenchmark.scoreDataSet — Method

Compute AnomalyBenchmark scores given a detector's results, actual anomalies and a cost matrix.

Arguments

labeler::Labeler : An object that stores and manipulates labels and windows for a given data set and its true anomalies.

data::DataFrame : The whole data set with default columns timestamp.

trueAnomalies::Vector{DateTime} : Timestamps of the ground truth anomalies.

predictions::AbstractVector{<:Integer} : Detector predictions of whether each record is anomalous or not. predictions[1:probationaryPeriod-1] are ignored.

Optional Arguments

detectorName::AbstractString="%" : The name of the anomaly detector.

profileName::AbstractString="standard" : The name of scoring profile. Each profile represents a cost matrix.

costMatrix::Dict{AbstractString, Float64} : The cost matrix for the profile with the following keys:

True positive (tp): detects the anomaly when the anomaly is present.
False positive (fp): detects the anomaly when the anomaly is absent.
True Negative (tn): does not detect the anomaly when the anomaly is absent.
False Negative (fn): does not detect the anomaly when the anomaly is present.

If a costMatrix is given, it will be applied in place of the cost matrix provided by the profileName.

Returns

Dict of values represents the anomaly detection benchmark for a given detector with the following keys:

scorer : The Scorer object for the detector.

detectorName : The name of the anomaly detector.

profileName : The name of scoring profile. If a customized costMatrix is provided, profileName is "customized".

scorer.score : The score of the anomaly detection algorithm results.

counts : The counts of tp, fp, tn and fn. Only predictions after probationaryPeriod are counted.

Examples

labeler = AnomalyBenchmark.Labeler(0.1, 0.15)
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)
trueAnomalies = [DateTime(2017, 1, 2)]
predictions = [0, 1, 0, 0, 0]

detectorName = "tester"
profileName = "standard"

julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, predictions, detectorName=detectorName, profileName=profileName)
Dict{ASCIIString,Any} with 5 entries:
  "detectorName" => "tester"
  "counts"       => Dict{AbstractString,Int64}("tp"=>1,"tn"=>2,"fn"=>0,"fp"=>2)
  "score"        => 0.78
  "profileName"  => "standard"
  "scorer"       => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…

labeler = AnomalyBenchmark.Labeler(0.1, 0.15)
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)
trueAnomalies = [DateTime(2017, 1, 2)]
predictions = [0, 1, 0, 0, 0]

detectorName = "tester"
costMatrix = Dict{AbstractString, Float64}("tpWeight" => 1.0, "fpWeight" => 1.0, "fnWeight" => 1.0)

julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, predictions, detectorName=detectorName, costMatrix=costMatrix)
Dict{ASCIIString,Any} with 5 entries:
  "detectorName" => "tester"
  "counts"       => Dict{AbstractString,Int64}("tp"=>1,"tn"=>4,"fn"=>0,"fp"=>2)
  "score"        => -1.0
  "profileName"  => "customized"
  "scorer"       => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…


anomalyScores = [0.7, 0.8, 0.5, 0.8, 0.9]
threshold = 0.75

julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, anomalyScores, threshold, detectorName=detectorName, costMatrix=costMatrix)
Dict{ASCIIString,Any} with 5 entries:
  "detectorName" => "tester"
  "counts"       => Dict{AbstractString,Int64}("tp"=>1,"tn"=>2,"fn"=>0,"fp"=>2)
  "score"        => -1.0
  "profileName"  => "customized"
  "scorer"       => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…

AnomalyBenchmark.setData — Method

Set value for field data in a Labeler

Arguments

labeler::Labeler

data::DataFrame : The whole data set with default columns timestamp.

Examples

labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1)::Day(1):DateTime(2017, 1, 5)
)

labeler.setData(data)
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │
│ 3   │ 3     │ 2017-01-03T00:00:00 │
│ 4   │ 4     │ 2017-01-04T00:00:00 │
│ 5   │ 5     │ 2017-01-05T00:00:00 │,0.1,0.15,0×0 DataFrames.DataFrame
,Int64[],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.setLabels — Method

Set value for field labels in a Labeler For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.

Arguments

labeler::Labeler

trueAnomalies::AbstractArray{DateTime, 1} : Timestamps of the ground truth anomalies.

Examples

labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)

labeler.setData(data)
labeler.setLabels(trueAnomalies)

julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │
│ 3   │ 3     │ 2017-01-03T00:00:00 │
│ 4   │ 4     │ 2017-01-04T00:00:00 │
│ 5   │ 5     │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp           │ label │
├─────┼─────────────────────┼───────┤
│ 1   │ 2017-01-01T00:00:00 │ 0     │
│ 2   │ 2017-01-02T00:00:00 │ 0     │
│ 3   │ 2017-01-03T00:00:00 │ 1     │
│ 4   │ 2017-01-04T00:00:00 │ 0     │
│ 5   │ 2017-01-05T00:00:00 │ 0     │,[3],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))

AnomalyBenchmark.sigmoid — Method

Standard sigmoid function.

\[\frac{1}{1+e^{-x}}\]

Base.show — Method

String representation of Window. For debugging.

Arguments

window::Window

Examples

data = DataFrame(
    index = 1:5,
    timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
window.repr()
WINDOW id=1234, limits: [2017-01-01T00:00:00, 2017-01-02T00:00:00], length: 2
window data:
2×2 DataFrames.DataFrame
│ Row │ index │ timestamp           │
├─────┼───────┼─────────────────────┼
│ 1   │ 1     │ 2017-01-01T00:00:00 │
│ 2   │ 2     │ 2017-01-02T00:00:00 │