EvalMetrics.jl
Utility package for scoring binary classification models.
Installation
Execute the following command in Julia Pkg REPL (EvalMetrics.jl
requires julia 1.0 or higher)
(v1.5) pkg> add EvalMetrics
Usage
Quickstart
The fastest way of getting started is to use a simple binary_eval_report
function in the following way:
julia> using EvalMetrics, Random
julia> Random.seed!(123);
julia> targets = rand(0:1, 100);
julia> scores = rand(100);
julia> binary_eval_report(targets, scores)
Dict{String,Real} with 8 entries:
"precision@fpr0.05" => 0.0
"recall@fpr0.05" => 0.0
"accuracy@fpr0.05" => 0.45
"au_prcurve" => 0.460134
"samples" => 100
"true negative rate@fpr0.05" => 0.957447
"au_roccurve" => 0.42232
"prevalence" => 0.53
julia> binary_eval_report(targets, scores, 0.001)
Dict{String,Real} with 8 entries:
"recall@fpr0.001" => 0.0
"au_prcurve" => 0.460134
"samples" => 100
"precision@fpr0.001" => 1.0
"au_roccurve" => 0.42232
"accuracy@fpr0.001" => 0.47
"prevalence" => 0.53
"true negative rate@fpr0.001" => 1.0
Confusion Matrix
The core the package is the ConfusionMatrix
structure, which represents the confusion matrix in the following form
  Actual positives  Actual negatives 
  ::  :: 
 Predicted positives  tp (# true positives)  fp (# false positives) 
 Predicted negatives  fn (# false negatives)  tn (# true negatives) 
  p (# positives)  n (# negatives) 
The confusion matrix can be calculated from targets and predicted values or from targets, scores, and one or more decision thresholds
julia> thres = 0.6;
julia> predicts = scores .>= thres;
julia> cm1 = ConfusionMatrix(targets, predicts)
ConfusionMatrix{Int64}(53, 47, 18, 24, 23, 35)
julia> cm2 = ConfusionMatrix(targets, scores, thres)
ConfusionMatrix{Int64}(53, 47, 18, 24, 23, 35)
julia> cm3 = ConfusionMatrix(targets, scores, thres)
ConfusionMatrix{Int64}(53, 47, 18, 24, 23, 35)
julia> cm4 = ConfusionMatrix(targets, scores, [thres, thres])
2element Array{ConfusionMatrix{Int64},1}:
ConfusionMatrix{Int64}(53, 47, 18, 24, 23, 35)
ConfusionMatrix{Int64}(53, 47, 18, 24, 23, 35)
The package provides many basic classification metrics based on the confusion matrix. The following table provides a list of all available metrics and its aliases
 Classification metric  Aliases 
   :: 
 true_positive
 
 true_negative
 
 false_positive
 
 false_negative
 
 true_positive_rate
 sensitivity
, recall
, hit_rate

 true_negative_rate
 specificity
, selectivity

 false_positive_rate
 fall_out
, type_I_error

 false_negative_rate
 miss_rate
, type_II_error

 precision
 positive_predictive_value

 negative_predictive_value
 
 false_discovery_rate
 
 false_omission_rate
 
 threat_score
 critical_success_index

 accuracy
 
 balanced_accuracy
 
 error_rate
 
 balanced_error_rate
 
 f1_score
 
 fβ_score
 
 matthews_correlation_coefficient
 mcc

 quant
 
 positive_likelihood_ratio
 
 negative_likelihood_ratio
 
 diagnostic_odds_ratio
 
 prevalence
 
Each metric can be computed from the ConfusionMatrix
structure
julia> recall(cm1)
0.33962264150943394
julia> recall(cm2)
0.33962264150943394
julia> recall(cm3)
0.33962264150943394
julia> recall(cm4)
2element Array{Float64,1}:
0.33962264150943394
0.33962264150943394
The other option is to compute the metric directly from targets and predicted values or from targets, scores, and one or more decision thresholds
julia> recall(targets, predicts)
0.33962264150943394
julia> recall(targets, scores, thres)
0.33962264150943394
julia> recall(targets, scores, thres)
0.33962264150943394
julia> recall(targets, scores, [thres, thres])
2element Array{Float64,1}:
0.33962264150943394
0.33962264150943394
User defined classification metrics
It may occur that some useful metric is not defined in the package. To simplify the process of defining a new metric, the package provides the @metric
macro and apply
function.
import EvalMetrics: @metric, metric
@metric MyRecall
apply(::Type{MyRecall}, x::ConfusionMatrix) = x.tp/x.p
In the previous example, macro @metric
defines a new abstract type MyRecall
(used for dispatch) and a function myrecall
(for easy use of the new metric). With defined abstract type MyRecall
, the next step is to define a new method for the apply
function. This method must have exactly two input arguments: Type{MyRecall}
and ConfusionMatrix
. If another argument is needed, it can be added as a keyword argument.
apply(::Type{Fβ_score}, x::ConfusionMatrix; β::Real = 1) =
(1 + β^2)*precision(x)*recall(x)/(β^2*precision(x) + recall(x))
It is easy to check that the myrecall
metric returns the same outputs as the recall
metric defined in the package
julia> myrecall(cm1)
0.33962264150943394
julia> myrecall(cm2)
0.33962264150943394
julia> myrecall(cm3)
0.33962264150943394
julia> myrecall(cm4)
2element Array{Float64,1}:
0.33962264150943394
0.33962264150943394
julia> myrecall(targets, predicts)
0.33962264150943394
julia> myrecall(targets, scores, thres)
0.33962264150943394
julia> myrecall(targets, scores, thres)
0.33962264150943394
julia> myrecall(targets, scores, [thres, thres])
2element Array{Float64,1}:
0.33962264150943394
0.33962264150943394
Label encodings
Different label encodings are considered common in different machine learning applications. For example, support vector machines use 1
as a positive label and 1
as a negative label. On the other hand, it is common for neural networks to use 0
as a negative label. The package provides some basic label encodings listed in the following table
 Encoding  positive label(s)  negative label(s) 
   ::  :: 
 OneZero(::Type{T})
 one(T)
 zero(T)

 OneMinusOne(::Type{T})
 one(T)
 one(T)

 OneTwo(::Type{T})
 one(T)
 2*one(T)

 OneVsOne(::Type{T}, pos::T, neg::T)
 pos
 neg

 OneVsRest(::Type{T}, pos::T, neg::AbstractVector{T})
 pos
 neg

 RestVsOne(::Type{T}, pos::AbstractVector{T}, neg::T)
 pos
 neg

The current_encoding
function can be used to verify which encoding is currently in use (by default it is OneZero
encoding)
julia> enc = current_encoding()
OneZero{Float64}:
positive class: 1.0
negative class: 0.0
One way to use a different encoding is to pass the new encoding as the first argument
julia> enc_new = OneVsOne(:positive, :negative)
OneVsOne{Symbol}:
positive class: positive
negative class: negative
julia> targets_recoded = recode.(enc, enc_new, targets);
julia> predicts_recoded = recode.(enc, enc_new, predicts);
julia> recall(enc, targets, predicts)
0.33962264150943394
julia> recall(enc_new, targets_recoded, predicts_recoded)
0.33962264150943394
The second way is to change the current encoding to the one you want
julia> set_encoding(OneVsOne(:positive, :negative))
OneVsOne{Symbol}:
positive class: positive
negative class: negative
julia> recall(targets_recoded, predicts_recoded)
0.33962264150943394
Decision thresholds for classification
The package provides a thresholds(scores::RealVector, n::Int)
, which returns n
decision thresholds which correspond to n
evenly spaced quantiles of the given scores
vector. The default value of n
is length(scores) + 1
. The thresholds
function has two keyword arguments reduced::Bool
and zerorecall::Bool
 If
reduced
istrue
(default), then the function returnsmin(length(scores) + 1, n)
thresholds.  If
zerorecall
istrue
(default), then the largest threshold ismaximum(scores)*(1 + eps())
otherwisemaximum(scores)
.
The package also provides some other useful utilities
threshold_at_tpr(targets::AbstractVector, scores::RealVector, tpr::Real)
returns the largest thresholdt
that satisfiestrue_positive_rate(targets, scores, t) >= tpr
threshold_at_tnr(targets::AbstractVector, scores::RealVector, tnr::Real)
returns the smallest thresholdt
that satisfiestrue_negative_rate(targets, scores, t) >= tnr
threshold_at_fpr(targets::AbstractVector, scores::RealVector, fpr::Real)
returns the smallest thresholdt
that satisfiesfalse_positive_rate(targets, scores, t) <= fpr
threshold_at_fnr(targets::AbstractVector, scores::RealVector, fnr::Real)
returns the largest thresholdt
that satisfiesfalse_negative_rate(targets, scores, t) <= fnr
All four functions can be called with an encoding of type AbstractEncoding
as the first parameter to use a different encoding than default.
Evaluation curves
Functionality for measuring performance with curves is implemented in the package as well. For example, a precisionrecall (PR) curve can be computed as follows:
julia> scores = [0.74, 0.48, 0.23, 0.91, 0.33, 0.92, 0.83, 0.61, 0.68, 0.09];
julia> targets = collect(1:10 .>= 3);
julia> prcurve(targets, scores)
([1.0, 0.875, 0.75, 0.625, 0.625, 0.5, 0.375, 0.375, 0.25, 0.125, 0.0],
[0.8, 0.7777777777777778, 0.75, 0.7142857142857143, 0.8333333333333334, 0.8, 0.75, 1.0, 1.0, 1.0, 1.0])
All possible calls:
prcurve(targets::AbstractVector, scores::RealVector)
returns alllength(target) + 1
pointsprcurve(enc::AbstractEncoding, target::AbstractVector, scores::RealVector)
makes different encodings possibleprcurve(targets::AbstractVector, scores::RealVector, thres::RealVector)
uses provided threshols to compute individual pointsprcurve(enc::AbstractEncoding, target::AbstractVector, scores::RealVector, thres::RealVector)
prcurve(cms::AbstractVector{<:ConfusionMatrix})
We can also compute area under the curve using the auc_trapezoidal
function which uses the trapezoidal rule as follows:
julia> auc_trapezoidal(prcurve(targets, scores)...)
0.8595734126984128
However, a convenience function au_prcurve
is provided with exactly the same signature as prcurve
function. Moreover, any curve(PRCurve, args...)
or auc(PRCurve, args...)
call is equivalent to prcurve(args...)
and au_prcurve(args...)
, respectively.
Besides PR curve, Receiver operating characteristic (ROC) curve is also available out of the box with analogical definitions of roccurve
and au_roccurve
.
All points of the curve, as well as area under curve scores are computed using the highest possible resolution by default. This can be changed by a keyword argument npoints
julia> length.(prcurve(targets, scores))
(11, 11)
julia> length.(prcurve(targets, scores; npoints=9))
(9, 9)
julia> auprcurve(targets, scores)
0.8595734126984128
julia> au_prcurve(targets, scores; npoints=9)
0.8826388888888889
Plotting
For plotting purposes, EvalMetrics.jl
provides recipes for the Plots
library:
julia> using Plots; pyplot()
julia> using Random, MLBase; Random.seed!(42);
julia> scores = sort(rand(10000));
julia> targets = scores .>= 0.99;
julia> targets[MLBase.sample(findall(0.98 .<= scores .< 0.99), 30; replace = false)] .= true;
julia> targets[MLBase.sample(findall(0.99 .<= scores .< 0.995), 30; replace = false)] .= false;
Then, any of the following can be used:
prplot(targets::AbstractVector, scores::RealVector)
to use the full resolution:
julia> prplot(targets, scores)
prplot(targets::AbstractVector, scores::RealVector, thresholds::RealVector)
to specify thresholds that will be usedprplot!(enc::AbstractEncoding, targets::AbstractVector, scores::RealVector)
to use a different encoding than defaultprplot!(enc::AbstractEncoding, targets::AbstractVector, scores::RealVector, thresholds::RealVector)
Furthermore, one can use vectors of vectors like [targets1, targets2]
and [scores1, scores2])
to plot multiple curves at once. The calls stay the same:
julia> prplot([targets, targets], [scores, scores .+ rand(10000) ./ 5])
For ROC curve use rocplot
analogically:
julia> rocplot(targets, scores)
julia> rocplot([targets, targets], [scores, scores .+ rand(10000) ./ 5])
'Modifying' versions with exclamation marks prplot!
and rocplot!
work as well.
The appearance of the plot can be changed in exactly the same way as with Plots
library. Therefore, keyword arguments such as xguide
, xlims
, grid
, fill
can all be used:
julia> prplot(targets, scores; xguide="RECALL", fill=:green, grid=false, xlims=(0.8, 1.0))
julia> rocplot(targets, scores, title="Title", label="experiment", xscale=:log10)
Here, limits on x axis are appropriately changed, unless overridden by using xlims
keyword argument.
julia> rocplot([targets, targets], [scores, scores .+ rand(10000) ./ 5], label=["a" "b";])
By default, plotted curves have 300 points, which are sampled to retain as much information as possible. This amounts to sampling false positive rate in case of ROC curves and true positive rate in case of PR curves instead of raw thresholds. The number of points can be again changed by keyword argument npoints
:
julia> prplot(targets, scores; npoints=Inf, label="Original")
julia> prplot!(targets, scores; npoints=10, label="Sampled (10 points)")
julia> prplot!(targets, scores; npoints=100, label="Sampled (100 points)")
julia> prplot!(targets, scores; npoints=1000, label="Sampled (1000 points)")
julia> prplot!(targets, scores; npoints=5000, label="Sampled (5000 points)")
Note that even though we visuallize smaller number of points, the displayed auc score is computed from all points. In case when logarithmic scale is used, the sampling is also done in logarithmic scale.
Other than that, diagonal
keyword indicates the diagonal in the plot, and aucshow
toggles, whether auc score is appended to a label:
julia> rocplot(targets, scores; aucshow=false, label="a", diagonal=true)
Userdefined curves
PR and ROC curves are available out of the box. Additional curve definitions can be provided in the similar way as new metrics are defined using macro @curve
and defining apply
function, which computes a point on the curve. For instance, ROC curve can be defined this way:
julia> import EvalMetrics: @curve, apply
julia> @curve MyROCCurve
julia> apply(::Type{MyROCCurve}, cms::AbstractVector{ConfusionMatrix{T}}) where T <: Real =
(false_positive_rate(cms), true_positive_rate(cms))
julia> myroccurve(targets, scores) == roccurve(targets, scores)
true
In order to be able to sample from x axis while plotting, sampling_function
and lowest_metric_value
must be provided as well.