AnomalyBenchmark.AnomalyBenchmark
— ModuleAnomalyBenchmark.jl:
Julia implementation of Numenta Anomaly Benchmark for Evaluating Algorithms for Streaming Anomaly Detection
This is a Julia implementation of Numenta's NAB Python package for Anomaly Benchmarking. The code is written from the ground up in Julia following the specifications of NAB.
AnomalyBenchmark.Labeler
— TypeObject to get labels and compute the window around each anomaly.
Fields
data::DataFrame
: The whole data set with default columns timestamp
.
windowSize::Float64
: Estimated size of an anomaly window, as a ratio to the data set length.
probationaryPercent::Float64
: The ratio of probationary period to the data set length.
labels::DataFrame
: Ground truth for each record. For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.
labelIndices::AbstractArray{Int, 1}
: Indices of the true anomalies in labels
windows::AbstractArray{Tuple{DateTime,DateTime},1}
: All the window limits in tuple form: (start time, end time).
Functions
setData::Function
: Set the dataset for Labeler
.
setLabels::Function
: Set the ground true labels from timestamps of true anomalies.
getWindows::Function
: Call applyWindows
and checkWindows
.
applyWindows::Function
: This takes all the true anomalies, and adds a standard window. The window length is the class variable windowSize
, and the location is centered on the anomaly timestamp.
checkWindows::Function
: This takes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.
Constructors
function Labeler(windowSize::Float64, probationaryPercent::Float64)
Arguments
windowSize::Float64
: Estimated size of an anomaly window, as a ratio to the data set length.
probationaryPercent::Float64
: The ratio of probationary period to the data set length.
Examples
Labeler(0.1, 0.15)
AnomalyBenchmark.Labeler(0×0 DataFrames.DataFrame
,0.1,0.15,0×0 DataFrames.DataFrame
,Int64[],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.Scorer
— TypeObject to score a data.
Fields
data::DataFrame
: The whole data set with default columns timestamp
, label
, index
and alerttype
.
probationaryPeriod::Int
: Row index after which predictions are scored.
costMatrix::Dict{AbstractString, Float64}
: The cost matrix for the profile with the following keys:
- True positive (tpWeight): detects the anomaly when the anomaly is present.
- False positive (fpWeight): detects the anomaly when the anomaly is absent.
- True Negative (tnWeight): does not detect the anomaly when the anomaly is absent.
- False Negative (fnWeight): does not detect the anomaly when the anomaly is present.
totalCount::Int
: The total count of labels.
counts::Dict{AbstractString, Int}
: The counts of tp
, fp
, tn
and fn
. Only predictions
after probationaryPeriod
are counted.
score::Float64
: The score of the anomaly detection algorithm results.
normalizedScore::Float64
: The normalized score of the anomaly detection algorithm such that the maximum possible is 100.0 (i.e. the perfect detector), and a baseline of 0.0 is determined by the "null" detector (which makes no detections).
len::Int
: The total count of predictions.
windows::Vector{Window}
: The list of windows for the data.
windowLimits::Vector{Tuple{DateTime,DateTime}}
: All the window limits in tuple form: (start time, end time).
Functions
getWindows::Function
: Create list of windows for the data.
getAlertTypes::Function
: For each record, decide whether it is a tp
, fp
, tn
, or fn
. Populate counts
dictionary with the total number of records in each category.
getScore::Function
: Score the entire data and return a single floating point score.
getClosestPrecedingWindow::Function
: Given a record index, find the closest preceding window.
normalizeScore::Function
: Normalize the detectors' scores according to the baseline defined by the null detector.
Constructor
Scorer(
timestamps::Vector{DateTime},
predictions::AbstractVector{<:Integer},
labels::AbstractVector{<:Integer},
windowLimits::Vector{Tuple{DateTime,DateTime}},
costMatrix::Dict{<:AbstractString, Float64},
probationaryPeriod::Int
)
Arguments
timestamps::Vector{DateTime}
: Timestamps in the data.
predictions::AbstractVector{<:Integer}
: Detector predictions of whether each record is anomalous or not. predictions[1:probationaryPeriod-1]
are ignored.
labels::AbstractVector{Integer}
: Ground truth for each record. For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.
windowLimits::Vector{Tuple{DateTime,DateTime}}
: All the window limits in tuple form: (start time, end time).
costMatrix::Dict{AbstractString, Float64}
: The cost matrix for the profile with the following keys:
- True positive (tpWeight): detects the anomaly when the anomaly is present.
- False positive (fpWeight): detects the anomaly when the anomaly is absent.
- True Negative (tnWeight): does not detect the anomaly when the anomaly is absent.
- False Negative (fnWeight): does not detect the anomaly when the anomaly is present.
probationaryPeriod::Int
: Row index after which predictions are scored.
Examples
timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame
│ Row │ timestamp │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1 │ 2017-01-01T00:00:00 │ 0 │ 1 │ "tn" │
│ 2 │ 2017-01-02T00:00:00 │ 1 │ 2 │ "tp" │
│ 3 │ 2017-01-03T00:00:00 │ 0 │ 3 │ "tn" │
│ 4 │ 2017-01-04T00:00:00 │ 0 │ 4 │ "tn" │
│ 5 │ 2017-01-05T00:00:00 │ 0 │ 5 │ "fp" │,1,Dict(:tpWeight=>1.0,:fnWeight=>1.0,:fpWeight=>1.0),5,
Dict{AbstractString,Int64}("tp"=>1,"tn"=>3,"fn"=>0,"fp"=>1),0.0,5,[AnomalyBenchmark.Window(1,2017-01-02T00:00:00,2017-01-03T00:00:00,2×4 DataFrames.DataFrame
│ Row │ timestamp │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1 │ 2017-01-02T00:00:00 │ 1 │ 2 │ "tp" │
│ 2 │ 2017-01-03T00:00:00 │ 0 │ 3 │ "tn" │,[2,3],2,(anonymous function),(anonymous function))],(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.Window
— TypeImmutable object to store a single window in a data. Each window represents a range of data points that is centered around a ground truth anomaly label.
Fields
id::Int
: The identifier of the Window
.
t1::DateTime
: The start time of the Window
.
t2::DateTime
: The end time of the Window
.
window::DataFrame
: The data within the Window
.
indices::AbstractArray
: The indices of the Window
in the data.
len::Int
: The length of the Window
.
Functions
repr::Function
: String representation of Window
. For debugging.
getFirstTruePositive::Function
: Get the index of the first true positive within a window.
Constructor
Window(windowId::Int, limits::Tuple{DateTime, DateTime}, data::DataFrame)
Arguments
windowId::Int
: An integer id for the Window
.
limits::Tuple{DateTime, DateTime}
: The start time and end time of the Window
.
data::DataFrame
: The whole data set with default columns index
and timestamp
.
Examples
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
AnomalyBenchmark.Window(1234,2017-01-01T00:00:00,2017-01-02T00:00:00,2×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │,[1,2],2,(anonymous function),(anonymous function))
AnomalyBenchmark.applyWindows
— MethodTakes all the true anomalies, as calculated by combineLabels(), and adds a standard window.
Arguments
labeler::Labeler
Examples
labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.applyWindows()
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │
│ 3 │ 3 │ 2017-01-03T00:00:00 │
│ 4 │ 4 │ 2017-01-04T00:00:00 │
│ 5 │ 5 │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp │ label │
├─────┼─────────────────────┼───────┤
│ 1 │ 2017-01-01T00:00:00 │ 0 │
│ 2 │ 2017-01-02T00:00:00 │ 0 │
│ 3 │ 2017-01-03T00:00:00 │ 1 │
│ 4 │ 2017-01-04T00:00:00 │ 0 │
│ 5 │ 2017-01-05T00:00:00 │ 0 │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.checkWindows
— MethodTakes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.
Arguments
labeler::Labeler
Examples
labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)
labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.applyWindows()
labeler.checkWindows()
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │
│ 3 │ 3 │ 2017-01-03T00:00:00 │
│ 4 │ 4 │ 2017-01-04T00:00:00 │
│ 5 │ 5 │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp │ label │
├─────┼─────────────────────┼───────┤
│ 1 │ 2017-01-01T00:00:00 │ 0 │
│ 2 │ 2017-01-02T00:00:00 │ 0 │
│ 3 │ 2017-01-03T00:00:00 │ 1 │
│ 4 │ 2017-01-04T00:00:00 │ 0 │
│ 5 │ 2017-01-05T00:00:00 │ 0 │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.convertAnomalousWindowsToTimestamps
— MethodReturns an array that contains all anomalous timestamps given an array of start time and end time for every anomalous time windows
Arguments
anomalousWindows::AbstractArray{Tuple{DateTime,DateTime},1}
: An array of start time and end time for every anomalous time windows.
Returns
Array{DateTime,1}
that contains all anomalous timestamps.
Examples
julia> anomalousWindows = [(DateTime(2017, 1, 3, 10, 1), DateTime(2017, 1, 3, 10, 5)), (DateTime(2017, 1, 3, 10, 58), DateTime(2017, 1, 3, 11, 0))]
2-element Array{Tuple{DateTime,DateTime},1}:
(2017-01-03T10:01:00,2017-01-03T10:05:00)
(2017-01-03T10:58:00,2017-01-03T11:00:00)
julia> AnomalyBenchmark.convertAnomalousWindowsToTimestamps(anomalousWindows)
8-element Array{DateTime,1}:
2017-01-03T10:01:00
2017-01-03T10:02:00
2017-01-03T10:03:00
2017-01-03T10:04:00
2017-01-03T10:05:00
2017-01-03T10:58:00
2017-01-03T10:59:00
2017-01-03T11:00:00
AnomalyBenchmark.convertAnomalyScoresToDetections
— MethodConvert anomaly scores (values between 0 and 1) to detections (binary values) given a threshold.
Arguments
anomalyScores::AbstractArray{Float64}
: An array of anomaly scores.
threshold::Float64
: The threshold for anomaly scores. If an anomaly score is greater than or equal to the threshold, the detection would be 1; otherwise, the detection would be 0.
Returns
Array{Int64,1}
- An array of detections (1 = anomalous, 0 = normal).
Examples
julia> convertAnomalyScoresToDetections([0.3, 0.5, 0.7], 0.6)
3-element Array{Int64,1}:
0
0
1
AnomalyBenchmark.getAlertTypes
— MethodCreate list of windows for the data
Arguments
scorer::Scorer
limits::Vector{Tuple{DateTime,DateTime}}
: All the window limits in tuple form: (start time, end time).
Returns
All the windows for the data of the scorer.
Examples
timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
julia> scorer.getAlertTypes(predictions)
5-element Array{AbstractString,1}:
"tn"
"tp"
"tn"
"tn"
"fp"
AnomalyBenchmark.getClosestPrecedingWindow
— MethodGiven a record index, find the closest preceding window.
Arguments
scorer::Scorer
index::Int
: Index of a record.
Returns
Window id for the last window preceding the given index.
Examples
timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
scorer.getClosestPrecedingWindow(2)
-1
scorer.getClosestPrecedingWindow(4)
1
AnomalyBenchmark.getFirstTruePositive
— MethodGet the index of the first true positive within a window.
Arguments
window::Window
Returns
Index of the first true positive within a window. -1 if there are none.
Examples
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5),
alerttype = ["fp", "tp", "tp", "fn", "tn"]
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
julia> window.getFirstTruePositive()
2
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5),
alerttype = ["fp", "fp", "fp", "fn", "tn"]
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
julia> window.getFirstTruePositive()
-1
AnomalyBenchmark.getProbationPeriod
— MethodReturn the probationary period index given probation percentage and the length of the file.
Arguments
probationPercent::Float64
: The percentage of predictions that won't be used for scoring.
fileLength::Int
: The number of rows of the data file.
Returns
::Int64
If the file length is less than 5000, the probation period would be the probation percentage times the file length; otherwise, it would be the probation percentage times 5,000.
Examples
julia> AnomalyBenchmark.getProbationPeriod(0.2, 4000)
800
julia> AnomalyBenchmark.getProbationPeriod(0.2, 10000)
1000
AnomalyBenchmark.getScore
— MethodScore the entire data and return a single floating point score. The position in a given window is calculated as the distance from the end of the window, normalized [-1,0]. I.e. positions -1.0 and 0.0 are at the very front and back of the anomaly window, respectively.
Flat scoring option: If you'd like to run a flat scorer that does not apply the scaled sigmoid weighting, comment out the two scaledSigmoid()
lines below, and uncomment the replacement lines to calculate thisTP
and thisFP
.
Arguments
scorer::Scorer
Returns
Tuple
scores::AbstractVector{Float64}
: The score at each timestamp of the data.
scorer.score::Float64
: The score of the anomaly detection algorithm results.
Examples
timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
scorer.getScore()
([0.0,1.0,0.0,0.0,-0.9999092042625951],9.079573740489177e-5)
AnomalyBenchmark.getWindows
— MethodTakes all the true anomalies, as calculated by combineLabels(), and adds a standard window. Takes the anomaly windows and checks for overlap with both each other and with the probationary period. Overlapping windows are merged into a single window. Windows overlapping with the probationary period are deleted.
Arguments
labeler::Labeler
Examples
labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
labeler.setData(data)
labeler.setLabels(trueAnomalies)
labeler.getWindows()
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │
│ 3 │ 3 │ 2017-01-03T00:00:00 │
│ 4 │ 4 │ 2017-01-04T00:00:00 │
│ 5 │ 5 │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp │ label │
├─────┼─────────────────────┼───────┤
│ 1 │ 2017-01-01T00:00:00 │ 0 │
│ 2 │ 2017-01-02T00:00:00 │ 0 │
│ 3 │ 2017-01-03T00:00:00 │ 1 │
│ 4 │ 2017-01-04T00:00:00 │ 0 │
│ 5 │ 2017-01-05T00:00:00 │ 0 │,[3],[(2017-01-03T00:00:00,2017-01-03T00:00:00)],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.getWindows
— MethodCreate list of windows for the data
Arguments
scorer::Scorer
limits::Vector{Tuple{DateTime,DateTime}}
: All the window limits in tuple form: (start time, end time).
Returns
All the windows for the data of the scorer.
Examples
timestamps = collect(DateTime(2017, 1, 1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
scorer.getWindows(windowLimits)
1-element Array{AnomalyBenchmark.Window,1}:
AnomalyBenchmark.Window(1,2017-01-02T00:00:00,2017-01-03T00:00:00,2×4 DataFrames.DataFrame
│ Row │ timestamp │ label │ index │ alerttype │
├─────┼─────────────────────┼───────┼───────┼───────────┤
│ 1 │ 2017-01-02T00:00:00 │ 1 │ 2 │ "tp" │
│ 2 │ 2017-01-03T00:00:00 │ 0 │ 3 │ "tn" │,[2,3],2,(anonymous function),(anonymous function))
AnomalyBenchmark.normalizeScore
— MethodNormalize the detectors' scores according to the baseline defined by the null detector, and print to the console. Function can only be called with the scoring step preceding it. The score is normalized by multiplying by 100 and dividing by perfect less the baseline, where the perfect score is the number of TPs possible.
Arguments
scorer::Scorer
timestamps = collect(DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5))
predictions = [0, 1, 0, 0, 1]
labels = [0, 1, 0, 0, 0]
windowLimits = [(DateTime(2017, 1, 2), DateTime(2017, 1, 3))]
costMatrix = Dict{AbstractString, Float64}(
"tpWeight" => 1.0,
"fnWeight" => 1.0,
"fpWeight" => 1.0
)
probationaryPeriod = 1
scorer = AnomalyBenchmark.Scorer(timestamps, predictions, labels, windowLimits, costMatrix, probationaryPeriod)
julia> scorer.getScore()
([0.0,1.0,0.0,0.0,-0.9999092042625951],9.079573740489177e-5)
julia> scorer.normalizeScore()
Running score normalization step
50.004539786870254
AnomalyBenchmark.scaledSigmoid
— MethodReturn a scaled sigmoid function given a relative position within a labeled window. The function is computed as follows:
A relative position of -1.0 is the far left edge of the anomaly window and corresponds to S = 2*sigmoid(5) - 1.0 = 0.98661
. This is the earliest to be counted as a true positive.
A relative position of -0.5 is halfway into the anomaly window and corresponds to S = 2*sigmoid(0.5*5) - 1.0 = 0.84828
.
A relative position of 0.0 consists of the right edge of the window and corresponds to S = 2*sigmoid(0) - 1 = 0.0
.
Relative positions > 0 correspond to false positives increasingly far away from the right edge of the window. A relative position of 1.0 is past the right edge of the window and corresponds to a score of 2*sigmoid(-5) - 1.0 = -0.98661
.
Arguments
relativePositionInWindow::Float64
: A relative position within a window calculated per the rules above.
Returns
Float64
The scaled sigmoid score.
Examples
julia> AnomalyBenchmark.scaledSigmoid(-1.0)
0.9866142981514305
julia> AnomalyBenchmark.scaledSigmoid(-0.5)
0.8482836399575131
julia> AnomalyBenchmark.scaledSigmoid(0.0)
0.0
julia> AnomalyBenchmark.scaledSigmoid(1.0)
-0.9866142981514303
AnomalyBenchmark.scoreDataSet
— MethodCompute AnomalyBenchmark scores given a detector's results, actual anomalies and a cost matrix.
Arguments
labeler::Labeler
: An object that stores and manipulates labels and windows for a given data set and its true anomalies.
data::DataFrame
: The whole data set with default columns timestamp
.
trueAnomalies::Vector{DateTime}
: Timestamps of the ground truth anomalies.
predictions::AbstractVector{<:Integer}
: Detector predictions of whether each record is anomalous or not. predictions[1:probationaryPeriod-1]
are ignored.
Optional Arguments
detectorName::AbstractString="%"
: The name of the anomaly detector.
profileName::AbstractString="standard"
: The name of scoring profile. Each profile represents a cost matrix.
costMatrix::Dict{AbstractString, Float64}
: The cost matrix for the profile with the following keys:
- True positive (tp): detects the anomaly when the anomaly is present.
- False positive (fp): detects the anomaly when the anomaly is absent.
- True Negative (tn): does not detect the anomaly when the anomaly is absent.
- False Negative (fn): does not detect the anomaly when the anomaly is present.
If a costMatrix
is given, it will be applied in place of the cost matrix provided by the profileName
.
Returns
Dict
of values represents the anomaly detection benchmark for a given detector with the following keys:
scorer
: The Scorer
object for the detector.
detectorName
: The name of the anomaly detector.
profileName
: The name of scoring profile. If a customized costMatrix
is provided, profileName
is "customized"
.
scorer.score
: The score of the anomaly detection algorithm results.
counts
: The counts of tp
, fp
, tn
and fn
. Only predictions
after probationaryPeriod
are counted.
Examples
labeler = AnomalyBenchmark.Labeler(0.1, 0.15)
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)
trueAnomalies = [DateTime(2017, 1, 2)]
predictions = [0, 1, 0, 0, 0]
detectorName = "tester"
profileName = "standard"
julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, predictions, detectorName=detectorName, profileName=profileName)
Dict{ASCIIString,Any} with 5 entries:
"detectorName" => "tester"
"counts" => Dict{AbstractString,Int64}("tp"=>1,"tn"=>2,"fn"=>0,"fp"=>2)
"score" => 0.78
"profileName" => "standard"
"scorer" => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…
labeler = AnomalyBenchmark.Labeler(0.1, 0.15)
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):Day(1):DateTime(2017, 1, 5)
)
trueAnomalies = [DateTime(2017, 1, 2)]
predictions = [0, 1, 0, 0, 0]
detectorName = "tester"
costMatrix = Dict{AbstractString, Float64}("tpWeight" => 1.0, "fpWeight" => 1.0, "fnWeight" => 1.0)
julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, predictions, detectorName=detectorName, costMatrix=costMatrix)
Dict{ASCIIString,Any} with 5 entries:
"detectorName" => "tester"
"counts" => Dict{AbstractString,Int64}("tp"=>1,"tn"=>4,"fn"=>0,"fp"=>2)
"score" => -1.0
"profileName" => "customized"
"scorer" => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…
anomalyScores = [0.7, 0.8, 0.5, 0.8, 0.9]
threshold = 0.75
julia> AnomalyBenchmark.scoreDataSet(labeler, data, trueAnomalies, anomalyScores, threshold, detectorName=detectorName, costMatrix=costMatrix)
Dict{ASCIIString,Any} with 5 entries:
"detectorName" => "tester"
"counts" => Dict{AbstractString,Int64}("tp"=>1,"tn"=>2,"fn"=>0,"fp"=>2)
"score" => -1.0
"profileName" => "customized"
"scorer" => AnomalyBenchmark.Scorer(5×4 DataFrames.DataFrame…
AnomalyBenchmark.setData
— MethodSet value for field data
in a Labeler
Arguments
labeler::Labeler
data::DataFrame
: The whole data set with default columns timestamp
.
Examples
labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1)::Day(1):DateTime(2017, 1, 5)
)
labeler.setData(data)
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │
│ 3 │ 3 │ 2017-01-03T00:00:00 │
│ 4 │ 4 │ 2017-01-04T00:00:00 │
│ 5 │ 5 │ 2017-01-05T00:00:00 │,0.1,0.15,0×0 DataFrames.DataFrame
,Int64[],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.setLabels
— MethodSet value for field labels
in a Labeler
For each record there should be a 1 or a 0. A 1 implies this record is within an anomalous window.
Arguments
labeler::Labeler
trueAnomalies::AbstractArray{DateTime, 1}
: Timestamps of the ground truth anomalies.
Examples
labeler = Labeler(0.1, 0.15)
trueAnomalies = [DateTime(2017, 1, 3)]
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
labeler.setData(data)
labeler.setLabels(trueAnomalies)
julia> labeler
AnomalyBenchmark.Labeler(5×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┤
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │
│ 3 │ 3 │ 2017-01-03T00:00:00 │
│ 4 │ 4 │ 2017-01-04T00:00:00 │
│ 5 │ 5 │ 2017-01-05T00:00:00 │,0.1,0.15,5×2 DataFrames.DataFrame
│ Row │ timestamp │ label │
├─────┼─────────────────────┼───────┤
│ 1 │ 2017-01-01T00:00:00 │ 0 │
│ 2 │ 2017-01-02T00:00:00 │ 0 │
│ 3 │ 2017-01-03T00:00:00 │ 1 │
│ 4 │ 2017-01-04T00:00:00 │ 0 │
│ 5 │ 2017-01-05T00:00:00 │ 0 │,[3],Tuple{DateTime,DateTime}[],(anonymous function),(anonymous function),(anonymous function),(anonymous function),(anonymous function))
AnomalyBenchmark.sigmoid
— MethodStandard sigmoid function.
\[\frac{1}{1+e^{-x}}\]
Base.show
— MethodString representation of Window
. For debugging.
Arguments
window::Window
Examples
data = DataFrame(
index = 1:5,
timestamp = DateTime(2017, 1, 1):DateTime(2017, 1, 5)
)
window = Window(1234, (DateTime(2017, 1, 1), DateTime(2017, 1, 2)), data)
window.repr()
WINDOW id=1234, limits: [2017-01-01T00:00:00, 2017-01-02T00:00:00], length: 2
window data:
2×2 DataFrames.DataFrame
│ Row │ index │ timestamp │
├─────┼───────┼─────────────────────┼
│ 1 │ 1 │ 2017-01-01T00:00:00 │
│ 2 │ 2 │ 2017-01-02T00:00:00 │