# Performance Measures

In MLJ loss functions, scoring rules, sensitivities, and so on, are collectively referred to as *measures*. Presently, MLJ includes a few built-in measures, provides support for the loss functions in the LossFunctions.jl library, and allows for users to define their own custom measures.

Providing further measures for probabilistic predictors, such as proper scoring rules, and for constructing multi-target product measures, is a work in progress.

*Note for developers:* The measures interface and the built-in measures described here are defined in MLJBase.

## Using built-in measures

These measures all have the common calling syntax

`measure(ŷ, y)`

or

`measure(ŷ, y, w)`

where `y`

iterates over observations of some target variable, and `ŷ`

iterates over predictions (`Distribution`

or `Sampler`

objects in the probabilistic case). Here `w`

is an optional vector of sample weights, which can be provided when the measure supports this.

```
julia> using MLJ
julia> y = [1, 2, 3, 4];
julia> ŷ = [2, 3, 3, 3];
julia> w = [1, 2, 2, 1];
julia> rms(ŷ, y) # reports an aggregrate loss
0.8660254037844386
julia> l1(ŷ, y, w) # reports per observation losses
4-element Array{Int64,1}:
1
2
0
1
julia> y = categorical(["male", "female", "female"])
3-element CategoricalArray{String,1,UInt32}:
"male"
"female"
"female"
julia> male = y[1]; female = y[2];
julia> d = UnivariateFinite([male, female], [0.55, 0.45]);
julia> ŷ = [d, d, d];
julia> cross_entropy(ŷ, y)
3-element Array{Float64,1}:
0.5978370007556204
0.7985076962177716
0.7985076962177716
```

## Traits and custom measures

Notice that `l1`

reports per-sample evaluations, while `rms`

only reports an aggregated result. This and other behavior can be gleaned from measure *traits* which are summarized by the `info`

method:

```
julia> info(l1)
absolute deviations; aliases: `l1`.
(name = "l1",
target_scitype = Union{AbstractArray{Continuous,1}, AbstractArray{Count,1}},
supports_weights = true,
prediction_type = :deterministic,
orientation = :loss,
reports_each_observation = true,
aggregation = MLJBase.Mean(),
is_feature_dependent = false,
docstring = "absolute deviations; aliases: `l1`.",
distribution_type = missing,)
```

Use `measures()`

to list all measures and `measures(conditions...)`

to search for measures with given traits (as you would query models).

`MLJBase.measures`

— Method`measures()`

List all measures as named-tuples keyed on measure traits.

`measures(filters...)`

List all measures `m`

for which `filter(m)`

is true, for each `filter`

in `filters`

.

`measures(matching(y))`

List all measures compatible with the target `y`

.

`measures(needle::Union{AbstractString,Regex}`

List all measures with `needle`

in a measure's `name`

or `docstring`

.

**Example**

Find all classification measures supporting sample weights:

```
measures(m -> m.target_scitype <: AbstractVector{<:Finite} &&
m.supports_weights)
```

Find all classification measures where the number of classes is three:

```
y = categorical(1:3)
measures(matching(y))
```

Find all measures in the `rms`

family:

`measures("rms")`

A user-defined measure in MLJ can be passed to the `evaluate!`

method, and elsewhere in MLJ, provided it is a function or callable object conforming to the above syntactic conventions. By default, a custom measure is understood to:

be a loss function (rather than a score)

report an aggregated value (rather than per-sample evaluations)

be feature-independent

To override this behaviour one simply overloads the appropriate trait, as shown in the following examples:

```
julia> y = [1, 2, 3, 4];
julia> ŷ = [2, 3, 3, 3];
julia> w = [1, 2, 2, 1];
julia> my_loss(ŷ, y) = maximum((ŷ - y).^2);
julia> my_loss(ŷ, y)
1
julia> my_per_sample_loss(ŷ, y) = abs.(ŷ - y);
julia> MLJ.reports_each_observation(::typeof(my_per_sample_loss)) = true;
julia> my_per_sample_loss(ŷ, y)
4-element Array{Int64,1}:
1
1
0
1
julia> my_weighted_score(ŷ, y) = 1/mean(abs.(ŷ - y));
julia> my_weighted_score(ŷ, y, w) = 1/mean(abs.((ŷ - y).^w));
julia> MLJ.supports_weights(::typeof(my_weighted_score)) = true;
julia> MLJ.orientation(::typeof(my_weighted_score)) = :score;
julia> my_weighted_score(ŷ, y)
1.3333333333333333
julia> X = (x=rand(4), penalty=[1, 2, 3, 4]);
julia> my_feature_dependent_loss(ŷ, X, y) = sum(abs.(ŷ - y) .* X.penalty)/sum(X.penalty);
julia> MLJ.is_feature_dependent(::typeof(my_feature_dependent_loss)) = true
julia> my_feature_dependent_loss(ŷ, X, y)
0.7
```

The possible signatures for custom measures are: `measure(ŷ, y)`

, `measure(ŷ, y, w)`

, `measure(ŷ, X, y)`

and `measure(ŷ, X, y, w)`

, each measure implementing one non-weighted version, and possibly a second weighted version.

*Implementation detail:* Internally, every measure is evaluated using the syntax

`MLJ.value(measure, ŷ, X, y, w)`

and the traits determine what can be ignored and how `measure`

is actually called. If `w=nothing`

then the non-weighted form of `measure`

is dispatched.

## Using measures from LossFunctions.jl

The LossFunctions.jl package includes "distance loss" functions for `Continuous`

targets, and "marginal loss" functions for `Binary`

targets. While the LossFunctions,jl interface differs from the present one (for, example `Binary`

observations must be +1 or -1), one can safely pass the loss functions defined there to any MLJ algorithm, which re-interprets it under the hood. Note that the "distance losses" in the package apply to deterministic predictions, while the "marginal losses" apply to probabilistic predictions.

```
julia> using LossFunctions
julia> X = (x1=rand(5), x2=rand(5)); y = categorical(["y", "y", "y", "n", "y"]); w = [1, 2, 1, 2, 3];
julia> mach = machine(ConstantClassifier(), X, y);
julia> holdout = Holdout(fraction_train=0.6);
julia> evaluate!(mach,
measure=[ZeroOneLoss(), L1HingeLoss(), L2HingeLoss(), SigmoidLoss()],
resampling=holdout,
operation=predict,
weights=w,
verbosity=0)
┌─────────────┬───────────────┬────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├─────────────┼───────────────┼────────────┤
│ ZeroOneLoss │ 0.4 │ [0.4] │
│ L1HingeLoss │ 0.8 │ [0.8] │
│ L2HingeLoss │ 1.6 │ [1.6] │
│ SigmoidLoss │ 0.848 │ [0.848] │
└─────────────┴───────────────┴────────────┘
_.per_observation = [[[0.8, 0.0]], [[1.6, 0.0]], [[3.2, 0.0]], [[1.409275324764612, 0.2860870128530822]]]
```

*Note:* Although `ZeroOneLoss(ŷ, y)`

makes no sense (neither `ŷ`

nor `y`

have a type expected by LossFunctions.jl), one can instead use the adaptor `MLJ.value`

as discussed above:

```
julia> ŷ = predict(mach, X);
julia> loss = MLJ.value(ZeroOneLoss(), ŷ, X, y, w) # X is ignored here
5-element Array{Float64,1}:
0.0
0.0
0.0
1.1111111111111112
0.0
julia> mean(loss) ≈ misclassification_rate(mode.(ŷ), y, w)
false
```

## Built-in measures

`MLJBase.area_under_curve`

— Constant`area_under_curve`

Area under the ROC curve; aliases: `area_under_curve`

, `auc`

`area_under_curve(ŷ, y)`

Return the area under the receiver operator characteristic (curve), for probabilistic predictions `ŷ`

, given ground truth `y`

. This metric is invariant to class labelling and can be used only for binary classification.

For more information, run `info(area_under_curve)`

.

`MLJBase.accuracy`

— Constant`accuracy`

Classification accuracy; aliases: `accuracy`

.

```
accuracy(ŷ, y)
accuracy(ŷ, y, w)
accuracy(conf_mat)
```

Returns the accuracy of the (point) predictions `ŷ`

, given true observations `y`

, optionally weighted by the weights `w`

. All three arguments must be abstract vectors of the same length. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run `info(accuracy)`

.

`MLJBase.balanced_accuracy`

— Constant`balanced_accuracy`

Balanced classification accuracy; aliases: `balanced_accuracy`

, `bacc`

, `bac`

.

```
balanced_accuracy(ŷ, y [, w])
balanced_accuracy(conf_mat)
```

Return the balanced accuracy of the point prediction `ŷ`

, given true observations `y`

, optionally weighted by `w`

. The balanced accuracy takes into consideration class imbalance. All three arguments must have the same length. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run `info(balanced_accuracy)`

.

`MLJBase.BrierScore`

— Type`BrierScore(; distribution=UnivariateFinite)(ŷ, y [, w])`

Given an abstract vector of distributions `ŷ`

of type `distribution`

, and an abstract vector of true observations `y`

, return the corresponding Brier (aka quadratic) scores. Weight the scores using `w`

if provided.

Currently only `distribution=UnivariateFinite`

is supported, which is applicable to superivised models with `Finite`

target scitype. In this case, if `p(y)`

is the predicted probability for a *single* observation `y`

, and `C`

all possible classes, then the corresponding Brier score for that observation is given by

$2p(y) - \left(\sum_{η ∈ C} p(η)^2\right) - 1$

Note that `BrierScore()=BrierScore{UnivariateFinite}`

has the alias `brier_score`

.

*Warning.* Here `BrierScore`

is a "score" in the sense that bigger is better (with `0`

optimal, and all other values negative). In Brier's original 1950 paper, and many other places, it has the opposite sign, despite the name. Moreover, the present implementation does not treat the binary case as special, so that the score may differ, in that case, by a factor of two from usage elsewhere.

For more information, run `info(BrierScore)`

.

`MLJBase.cross_entropy`

— Constant`cross_entropy`

Cross entropy loss with probabilities clamped between `eps()`

and `1-eps()`

; aliases: `cross_entropy`

.

```
ce = CrossEntropy(; eps=eps())
ce(ŷ, y)
```

Given an abstract vector of distributions `ŷ`

and an abstract vector of true observations `y`

, return the corresponding cross-entropy loss (aka log loss) scores.

Since the score is undefined in the case of the true observation has predicted probability zero, probablities are clipped between `eps`

and `1-eps`

where `eps`

can be specified.

If `sᵢ`

is the predicted probability for the true class `yᵢ`

then the score for that example is given by

`-log(clamp(sᵢ, eps, 1-eps))`

For more information, run `info(cross_entropy)`

.

`MLJBase.FScore`

— Type`FScore{β}(rev=nothing)`

One-parameter generalization, $F_β$, of the F-measure or balanced F-score.

`FScore{β}(ŷ, y)`

Evaluate $F_β$ score on observations ,`ŷ`

, given ground truth values, `y`

.

By default, the second element of `levels(y)`

is designated as `true`

. To reverse roles, use `FScore{β}(rev=true)`

instead of `FScore{β}`

.

For more information, run `info(FScore)`

.

`MLJBase.false_discovery_rate`

— Constant`false_discovery_rate`

false discovery rate; aliases: `false_discovery_rate`

, `falsediscovery_rate`

, `fdr`

.

`false_discovery_rate(ŷ, y)`

False discovery rate for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `FalseDiscoveryRate(rev=true)`

instead of `false_discovery_rate`

.

For more information, run `info(false_discovery_rate)`

.

`MLJBase.false_negative`

— Constant`false_negative`

Number of false negatives; aliases: `false_negative`

, `falsenegative`

.

`false_negative(ŷ, y)`

Number of false positives for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `FalseNegative(rev=true)`

instead of `false_negative`

.

For more information, run `info(false_negative)`

.

`MLJBase.false_negative_rate`

— Constant`false_negative_rate`

false negative rate; aliases: `false_negative_rate`

, `falsenegative_rate`

, `fnr`

, `miss_rate`

.

`false_negative_rate(ŷ, y)`

False negative rate for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `FalseNegativeRate(rev=true)`

instead of `false_negative_rate`

.

For more information, run `info(false_negative_rate)`

.

`MLJBase.false_positive`

— Constant`false_positive`

Number of false positives; aliases: `false_positive`

, `falsepositive`

.

`false_positive(ŷ, y)`

Number of false positives for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `FalsePositive(rev=true)`

instead of `false_positive`

.

For more information, run `info(false_positive)`

.

`MLJBase.false_positive_rate`

— Constant`false_positive_rate`

false positive rate; aliases: `false_positive_rate`

, `falsepositive_rate`

, `fpr`

, `fallout`

.

`false_positive_rate(ŷ, y)`

False positive rate for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `FalsePositiveRate(rev=true)`

instead of `false_positive_rate`

.

For more information, run `info(false_positive_rate)`

.

`MLJBase.l1`

— Constant```
l1(ŷ, y)
l1(ŷ, y, w)
```

L1 per-observation loss.

For more information, run `info(l1)`

.

`MLJBase.l2`

— Constant```
l2(ŷ, y)
l2(ŷ, y, w)
```

L2 per-observation loss.

For more information, run `info(l2)`

.

`MLJBase.mae`

— Constant```
mae(ŷ, y)
mae(ŷ, y, w)
```

Mean absolute error.

$\text{MAE} = n^{-1}∑ᵢ|yᵢ-ŷᵢ|$ or $\text{MAE} = n^{-1}∑ᵢwᵢ|yᵢ-ŷᵢ|$

For more information, run `info(mae)`

.

`MLJBase.matthews_correlation`

— Constant`matthews_correlation`

Matthew's correlation; aliases: `matthews_correlation`

, `mcc`

```
matthews_correlation(ŷ, y)
matthews_correlation(conf_mat)
```

Return Matthews' correlation coefficient corresponding to the point prediction `ŷ`

, given true observations `y`

. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run `info(matthews_correlation)`

.

`MLJBase.misclassification_rate`

— Constant`misclassification_rate`

misclassification rate; aliases: `misclassification_rate`

, `mcr`

.

```
misclassification_rate(ŷ, y)
misclassification_rate(ŷ, y, w)
misclassification_rate(conf_mat)
```

Returns the rate of misclassification of the (point) predictions `ŷ`

, given true observations `y`

, optionally weighted by the weights `w`

. All three arguments must be abstract vectors of the same length. A confusion matrix can also be passed as argument. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run `info(misclassification_rate)`

.

`MLJBase.negative_predictive_value`

— Constant`negative_predictive_value`

negative predictive value; aliases: `negative_predictive_value`

, `negativepredictive_value`

, `npv`

.

`negative_predictive_value(ŷ, y)`

Negative predictive value for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `NPV(rev=true)`

instead of `negative_predictive_value`

.

For more information, run `info(negative_predictive_value)`

.

`MLJBase.positive_predictive_value`

— Constant`positive_predictive_value`

positive predictive value (aka precision); aliases: `positive_predictive_value`

, `ppv`

, `Precision()`

, `positivepredictive_value`

.

`positive_predictive_value(ŷ, y)`

Positive predictive value for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `Precision(rev=true)`

instead of `positive_predictive_value`

.

For more information, run `info(positive_predictive_value)`

.

`MLJBase.rms`

— Constant```
rms(ŷ, y)
rms(ŷ, y, w)
```

Root mean squared error:

$\text{RMS} = \sqrt{n^{-1}∑ᵢ|yᵢ-ŷᵢ|^2}$ or $\text{RMS} = \sqrt{\frac{∑ᵢwᵢ|yᵢ-ŷᵢ|^2}{∑ᵢwᵢ}}$

For more information, run `info(rms)`

.

`MLJBase.rmsl`

— Constant`rmsl(ŷ, y)`

Root mean squared logarithmic error:

$\text{RMSL} = n^{-1}∑ᵢ\log\left({yᵢ \over ŷᵢ}\right)$

For more information, run `info(rmsl)`

.

See also `rmslp1`

.

`MLJBase.rmslp1`

— Constant`rmslp1(ŷ, y)`

Root mean squared logarithmic error with an offset of 1:

$\text{RMSLP1} = n^{-1}∑ᵢ\log\left({yᵢ + 1 \over ŷᵢ + 1}\right)$

For more information, run `info(rmslp1)`

.

See also `rmsl`

.

`MLJBase.rmsp`

— Constant`rmsp(ŷ, y)`

Root mean squared proportional loss:

$\text{RMSP} = m^{-1}∑ᵢ \left({yᵢ-ŷᵢ \over yᵢ}\right)^2$

where the sum is over indices such that `yᵢ≂̸0`

and `m`

is the number of such indices.

For more information, run `info(rmsp)`

.

`MLJBase.true_negative`

— Constant`true_negative`

Number of true negatives; aliases: `true_negative`

, `truenegative`

.

`true_negative(ŷ, y)`

Number of true negatives for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `TrueNegative(rev=true)`

instead of `true_negative`

.

For more information, run `info(true_negative)`

.

`MLJBase.true_negative_rate`

— Constant`true_negative_rate`

true negative rate; aliases: `true_negative_rate`

, `truenegative_rate`

, `tnr`

, `specificity`

, `selectivity`

.

`true_negative_rate(ŷ, y)`

True negative rate for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `TrueNegativeRate(rev=true)`

instead of `true_negative_rate`

.

For more information, run `info(true_negative_rate)`

.

`MLJBase.true_positive`

— Constant`true_positive`

Number of true positives; aliases: `true_positive`

, `truepositive`

.

`true_positive(ŷ, y)`

Number of true positives for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `TruePositive(rev=true)`

instead of `true_positive`

.

For more information, run `info(true_positive)`

.

`MLJBase.true_positive_rate`

— Constant`true_positive_rate`

True positive rate; aliases: `true_positive_rate`

, `truepositive_rate`

, `tpr`

, `sensitivity`

, `recall`

, `hit_rate`

.

`true_positive_rate(ŷ, y)`

True positive rate for observations `ŷ`

and ground truth `y`

. Assigns `false`

to first element of `levels(y)`

. To reverse roles, use `TruePositiveRate(rev=true)`

instead of `true_positive_rate`

.

For more information, run `info(true_positive_rate)`

.

## List of LossFunctions.jl measures

`DWDMarginLoss()`

, `ExpLoss()`

, `L1HingeLoss()`

, `L2HingeLoss()`

, `L2MarginLoss()`

, `LogitMarginLoss()`

, `ModifiedHuberLoss()`

, `PerceptronLoss()`

, `ScaledMarginLoss()`

, `SigmoidLoss()`

, `SmoothedL1HingeLoss()`

, `ZeroOneLoss()`

, `HuberLoss()`

, `L1EpsilonInsLoss()`

, `L2EpsilonInsLoss()`

, `LPDistLoss()`

, `LogitDistLoss()`

, `PeriodicLoss()`

, `QuantileLoss()`

, `ScaledDistanceLoss()`

.

## Other performance related tools

`MLJBase.confusion_matrix`

— Function`confusion_matrix(ŷ, y; rev=false)`

Computes the confusion matrix given a predicted `ŷ`

with categorical elements and the actual `y`

. Rows are the predicted class, columns the ground truth. The ordering follows that of `levels(y)`

.

**Keywords**

`rev=false`

: in the binary case, this keyword allows to swap the ordering of classes.`perm=[]`

: in the general case, this keyword allows to specify a permutation re-ordering the classes.`warn=true`

: whether to show a warning in case`y`

does not have scientific type`OrderedFactor{2}`

(see note below).

**Note**

To decrease the risk of unexpected errors, if `y`

does not have scientific type `OrderedFactor{2}`

(and so does not have a "natural ordering" negative-positive), a warning is shown indicating the current order unless the user explicitly specifies either `rev`

or `perm`

in which case it's assumed the user is aware of the class ordering.

The `confusion_matrix`

is a measure (although neither a score nor a loss) and so may be specified as such in calls to `evaluate`

, `evaluate!`

, although not in `TunedModel`

s. In this case, however, there no way to specify an ordering different from `levels(y)`

, where `y`

is the target.

`MLJBase.roc_curve`

— Function`fprs, tprs, ts = roc_curve(ŷ, y) = roc(ŷ, y)`

Return the ROC curve for a two-class probabilistic prediction `ŷ`

given the ground truth `y`

. The true positive rates, false positive rates over a range of thresholds `ts`

are returned. Note that if there are `k`

unique scores, there are correspondingly `k`

thresholds and `k+1`

"bins" over which the FPR and TPR are constant:

`[0.0 - thresh[1]]`

`[thresh[1] - thresh[2]]`

- ...
`[thresh[k] - 1]`

consequently, `tprs`

and `fprs`

are of length `k+1`

if `ts`

is of length `k`

.

To draw the curve using your favorite plotting backend, do `plot(fprs, tprs)`

.