# Performance Measures

In MLJ loss functions, scoring rules, sensitivities, and so on, are collectively referred to as *measures*. These include re-exported loss functions from the LossFunctions.jl library, overloaded to behave the same way as the built-in measures.

To see list all measures, run `measures()`

. Further measures for probabilistic predictors, such as proper scoring rules, and for constructing multi-target product measures, are planned. If you'd like to see measure added to MLJ, post a comment here.g

*Note for developers:* The measures interface and the built-in measures described here are defined in MLJBase, but will ultimately live in a separate package.

## Using built-in measures

These measures all have the common calling syntax

`measure(ŷ, y)`

or

`measure(ŷ, y, w)`

where `y`

iterates over observations of some target variable, and `ŷ`

iterates over predictions (`Distribution`

or `Sampler`

objects in the probabilistic case). Here `w`

is an optional vector of sample weights, or a dictionary of class weights, when these are supported by the measure.

```
julia> using MLJ
julia> y = [1, 2, 3, 4];
julia> ŷ = [2, 3, 3, 3];
julia> w = [1, 2, 2, 1];
julia> rms(ŷ, y) # reports an aggregrate loss
0.8660254037844386
julia> l2(ŷ, y, w) # reports per observation losses
4-element Vector{Int64}:
1
2
0
1
julia> y = coerce(["male", "female", "female"], Multiclass)
3-element CategoricalArray{String,1,UInt32}:
"male"
"female"
"female"
julia> d = UnivariateFinite(["male", "female"], [0.55, 0.45], pool=y);
julia> ŷ = [d, d, d];
julia> log_loss(ŷ, y)
3-element Vector{Float64}:
0.7985076962177716
0.5978370007556204
0.5978370007556204
```

The measures `rms`

, `l2`

and `log_loss`

illustrated here are actually instances of measure *types*. For, example, `l2 = LPLoss(p=2)`

and `log_loss = LogLoss() = LogLoss(tol=eps())`

. Common aliases are provided:

```
julia> cross_entropy
LogLoss(tol = 2.220446049250313e-16) @449
```

## Traits and custom measures

Notice that `l1`

reports per-sample evaluations, while `rms`

only reports an aggregated result. This and other behavior can be gleaned from measure *traits* which are summarized by the `info`

method:

```
julia> info(l1)
`LPLoss` - lp loss type with instances `l1`, `l2`.
(name = "LPLoss",
instances = ["l1", "l2"],
human_name = "lp loss",
target_scitype = Union{AbstractArray{var"#s1071", N} where {var"#s1071"<:Union{Missing, Continuous}, N}, AbstractArray{var"#s1070", N} where {var"#s1070"<:Union{Missing, Count}, N}},
supports_weights = true,
supports_class_weights = false,
prediction_type = :deterministic,
orientation = :loss,
reports_each_observation = true,
aggregation = StatisticalTraits.Mean(),
is_feature_dependent = false,
docstring = "`LPLoss` - lp loss type with instances `l1`, `l2`. ",
distribution_type = Unknown,)
```

Query the doc-string for a measure using the name of its type:

```
julia> rms
RootMeanSquaredError() @852
julia> @doc RootMeanSquaredError # same as `?RootMeanSqauredError
MLJBase.RootMeanSquaredError
A measure type for root mean squared error, which includes the instance(s):
rms, rmse, root_mean_squared_error.
RootMeanSquaredError()(ŷ, y)
RootMeanSquaredError()(ŷ, y, w)
Evaluate the root mean squared error on predictions ŷ, given ground truth
observations y. Optionally specify per-sample weights, w.
\text{root mean squared error} = \sqrt{n^{-1}∑ᵢ|yᵢ-ŷᵢ|^2} or \text{root
mean squared error} = \sqrt{\frac{∑ᵢwᵢ|yᵢ-ŷᵢ|^2}{∑ᵢwᵢ}}
Requires scitype(y) to be a subtype of Union{AbstractArray{var"#s1071", N}
where {var"#s1071"<:Union{Missing, ScientificTypesBase.Continuous}, N},
AbstractArray{var"#s1070", N} where {var"#s1070"<:Union{Missing,
ScientificTypesBase.Count}, N}}; ŷ must be an array of deterministic
predictions.
For more information, run info(RootMeanSquaredError).
```

Use `measures()`

to list all measures, and `measures(conditions...)`

to search for measures with given traits (as you would query models). The trait `instances`

list the actual callable instances of a given measure type (typically aliases for the default instance).

`MLJBase.measures`

— Method`measures()`

List all measures as named-tuples keyed on measure traits.

`measures(filters...)`

List all measures compatible with the target `y`

.

`measures(needle::Union{AbstractString,Regex}`

List all measures with `needle`

in a measure's `name`

or `docstring`

.

**Example**

Find all classification measures supporting sample weights:

```
measures(m -> m.target_scitype <: AbstractVector{<:Finite} &&
m.supports_weights)
```

Find all measures in the `rms`

family:

`measures("rms")`

A user-defined measure in MLJ can be passed to the `evaluate!`

method, and elsewhere in MLJ, provided it is a function or callable object conforming to the above syntactic conventions. By default, a custom measure is understood to:

be a loss function (rather than a score)

report an aggregated value (rather than per-sample evaluations)

be feature-independent

To override this behaviour one simply overloads the appropriate trait, as shown in the following examples:

```
julia> y = [1, 2, 3, 4];
julia> ŷ = [2, 3, 3, 3];
julia> w = [1, 2, 2, 1];
julia> my_loss(ŷ, y) = maximum((ŷ - y).^2);
julia> my_loss(ŷ, y)
1
julia> my_per_sample_loss(ŷ, y) = abs.(ŷ - y);
julia> MLJ.reports_each_observation(::typeof(my_per_sample_loss)) = true;
julia> my_per_sample_loss(ŷ, y)
4-element Vector{Int64}:
1
1
0
1
julia> my_weighted_score(ŷ, y) = 1/mean(abs.(ŷ - y));
julia> my_weighted_score(ŷ, y, w) = 1/mean(abs.((ŷ - y).^w));
julia> MLJ.supports_weights(::typeof(my_weighted_score)) = true;
julia> MLJ.orientation(::typeof(my_weighted_score)) = :score;
julia> my_weighted_score(ŷ, y)
1.3333333333333333
julia> X = (x=rand(4), penalty=[1, 2, 3, 4]);
julia> my_feature_dependent_loss(ŷ, X, y) = sum(abs.(ŷ - y) .* X.penalty)/sum(X.penalty);
julia> MLJ.is_feature_dependent(::typeof(my_feature_dependent_loss)) = true
julia> my_feature_dependent_loss(ŷ, X, y)
0.7
```

The possible signatures for custom measures are: `measure(ŷ, y)`

, `measure(ŷ, y, w)`

, `measure(ŷ, X, y)`

and `measure(ŷ, X, y, w)`

, each measure implementing one non-weighted version, and possibly a second weighted version.

*Implementation detail:* Internally, every measure is evaluated using the syntax

`MLJ.value(measure, ŷ, X, y, w)`

and the traits determine what can be ignored and how `measure`

is actually called. If `w=nothing`

then the non-weighted form of `measure`

is dispatched.

## Using measures from LossFunctions.jl

The LossFunctions.jl package includes "distance loss" functions for `Continuous`

targets, and "marginal loss" functions for `Finite{2}`

(binary) targets. While the LossFunctions.jl interface differs from the present one (for, example binary observations must be +1 or -1), MLJ has overloaded instances of the LossFunctions.jl types to behave the same as the built-in types.

Note that the "distance losses" in the package apply to deterministic predictions, while the "marginal losses" apply to probabilistic predictions.

## List of measures

All measures listed below have a doc-string associated with the measure's *type*. So, for example, do `?LPLoss`

not `?l2`

.

```
ms = measures()
types = map(ms) do m
m.name
end
instance = map(ms) do m m.instances end
table = (type=types, instances=instance)
DataFrame(table)
```

61 rows × 2 columns

type | instances | |
---|---|---|

String | Array… | |

1 | BrierLoss | ["brier_loss"] |

2 | BrierScore | ["brier_score"] |

3 | LPLoss | ["l1", "l2"] |

4 | LogCoshLoss | ["log_cosh", "log_cosh_loss"] |

5 | LogLoss | ["log_loss", "cross_entropy"] |

6 | LogScore | ["log_score"] |

7 | SphericalScore | ["spherical_score"] |

8 | Accuracy | ["accuracy"] |

9 | AreaUnderCurve | ["area_under_curve", "auc"] |

10 | BalancedAccuracy | ["balanced_accuracy", "bacc", "bac"] |

11 | ConfusionMatrix | ["confusion_matrix", "confmat"] |

12 | FScore | ["f1score"] |

13 | FalseDiscoveryRate | ["false_discovery_rate", "falsediscovery_rate", "fdr"] |

14 | FalseNegative | ["false_negative", "falsenegative"] |

15 | FalseNegativeRate | ["false_negative_rate", "falsenegative_rate", "fnr", "miss_rate"] |

16 | FalsePositive | ["false_positive", "falsepositive"] |

17 | FalsePositiveRate | ["false_positive_rate", "falsepositive_rate", "fpr", "fallout"] |

18 | MatthewsCorrelation | ["matthews_correlation", "mcc"] |

19 | MeanAbsoluteError | ["mae", "mav", "mean_absolute_error", "mean_absolute_value"] |

20 | MeanAbsoluteProportionalError | ["mape"] |

21 | MisclassificationRate | ["misclassification_rate", "mcr"] |

22 | MulticlassFScore | ["macro_f1score", "micro_f1score", "multiclass_f1score"] |

23 | MulticlassFalseDiscoveryRate | ["multiclass_falsediscovery_rate", "multiclass_fdr"] |

24 | MulticlassFalseNegative | ["multiclass_false_negative", "multiclass_falsenegative"] |

25 | MulticlassFalseNegativeRate | ["multiclass_false_negative_rate", "multiclass_fnr", "multiclass_miss_rate", "multiclass_falsenegative_rate"] |

26 | MulticlassFalsePositive | ["multiclass_false_positive", "multiclass_falsepositive"] |

27 | MulticlassFalsePositiveRate | ["multiclass_false_positive_rate", "multiclass_fpr", "multiclass_fallout", "multiclass_falsepositive_rate"] |

28 | MulticlassNegativePredictiveValue | ["multiclass_negative_predictive_value", "multiclass_negativepredictive_value", "multiclass_npv"] |

29 | MulticlassPrecision | ["multiclass_positive_predictive_value", "multiclass_ppv", "multiclass_positivepredictive_value", "multiclass_recall"] |

30 | MulticlassTrueNegative | ["multiclass_true_negative", "multiclass_truenegative"] |

31 | MulticlassTrueNegativeRate | ["multiclass_true_negative_rate", "multiclass_tnr", "multiclass_specificity", "multiclass_selectivity", "multiclass_truenegative_rate"] |

32 | MulticlassTruePositive | ["multiclass_true_positive", "multiclass_truepositive"] |

33 | MulticlassTruePositiveRate | ["multiclass_true_positive_rate", "multiclass_tpr", "multiclass_sensitivity", "multiclass_recall", "multiclass_hit_rate", "multiclass_truepositive_rate"] |

34 | NegativePredictiveValue | ["negative_predictive_value", "negativepredictive_value", "npv"] |

35 | Precision | ["positive_predictive_value", "ppv", "positivepredictive_value", "precision"] |

36 | RootMeanSquaredError | ["rms", "rmse", "root_mean_squared_error"] |

37 | RootMeanSquaredLogError | ["rmsl", "rmsle", "root_mean_squared_log_error"] |

38 | RootMeanSquaredLogProportionalError | ["rmslp1"] |

39 | RootMeanSquaredProportionalError | ["rmsp"] |

40 | TrueNegative | ["true_negative", "truenegative"] |

41 | TrueNegativeRate | ["true_negative_rate", "truenegative_rate", "tnr", "specificity", "selectivity"] |

42 | TruePositive | ["true_positive", "truepositive"] |

43 | TruePositiveRate | ["true_positive_rate", "truepositive_rate", "tpr", "sensitivity", "recall", "hit_rate"] |

44 | DWDMarginLoss | ["dwd_margin_loss"] |

45 | ExpLoss | ["exp_loss"] |

46 | L1HingeLoss | ["l1_hinge_loss"] |

47 | L2HingeLoss | ["l2_hinge_loss"] |

48 | L2MarginLoss | ["l2_margin_loss"] |

49 | LogitMarginLoss | ["logit_margin_loss"] |

50 | ModifiedHuberLoss | ["modified_huber_loss"] |

51 | PerceptronLoss | ["perceptron_loss"] |

52 | SigmoidLoss | ["sigmoid_loss"] |

53 | SmoothedL1HingeLoss | ["smoothed_l1_hinge_loss"] |

54 | ZeroOneLoss | ["zero_one_loss"] |

55 | HuberLoss | ["huber_loss"] |

56 | L1EpsilonInsLoss | ["l1_epsilon_ins_loss"] |

57 | L2EpsilonInsLoss | ["l2_epsilon_ins_loss"] |

58 | LPDistLoss | ["lp_dist_loss"] |

59 | LogitDistLoss | ["logit_dist_loss"] |

60 | PeriodicLoss | ["periodic_loss"] |

61 | QuantileLoss | ["quantile_loss"] |

## Other performance related tools

In MLJ one computes a confusion matrix by calling an instance of the `ConfusionMatrix`

measure type on the data:

`MLJBase.ConfusionMatrix`

— Type`MLJBase.ConfusionMatrix`

A measure type for confusion matrix, which includes the instance(s): `confusion_matrix`

, `confmat`

.

`ConfusionMatrix()(ŷ, y)`

Evaluate the default instance of ConfusionMatrix on predictions `ŷ`

, given ground truth observations `y`

.

If `r`

is the return value, then the raw confusion matrix is `r.mat`

, whose rows correspond to predictions, and columns to ground truth. The ordering follows that of `levels(y)`

.

Use `ConfusionMatrix(perm=[2, 1])`

to reverse the class order for binary data. For more than two classes, specify an appropriate permutation, as in `ConfusionMatrix(perm=[2, 3, 1])`

.

Requires `scitype(y)`

to be a subtype of `AbstractArray{<:OrderedFactor{2}}`

(binary classification where choice of "true" effects the measure); `ŷ`

must be an array of deterministic predictions.

For more information, run `info(ConfusionMatrix)`

.

`MLJBase.roc_curve`

— Function`fprs, tprs, ts = roc_curve(ŷ, y) = roc(ŷ, y)`

Return the ROC curve for a two-class probabilistic prediction `ŷ`

given the ground truth `y`

. The true positive rates, false positive rates over a range of thresholds `ts`

are returned. Note that if there are `k`

unique scores, there are correspondingly `k`

thresholds and `k+1`

"bins" over which the FPR and TPR are constant:

`[0.0 - thresh[1]]`

`[thresh[1] - thresh[2]]`

- ...
`[thresh[k] - 1]`

consequently, `tprs`

and `fprs`

are of length `k+1`

if `ts`

is of length `k`

.

To draw the curve using your favorite plotting backend, do `plot(fprs, tprs)`

.