# Performance Measures

## Quick links

## Introduction

In MLJ loss functions, scoring rules, confusion matrices, sensitivities, etc, are collectively referred to as *measures*. These measures are provided by the package StatisticalMeasures.jl but are immediately available to the MLJ user. Here's a simple example of direct application of the `log_loss`

measures to compute a training loss:

```
using MLJ
X, y = @load_iris
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
tree = DecisionTreeClassifier(max_depth=2)
mach = machine(tree, X, y) |> fit!
yhat = predict(mach, X)
log_loss(yhat, y)
```

`0.143176310291424`

For more examples of direct measure usage, see the StatisticalMeasures.jl tutorial.

A list of all measures, ready to use after running `using MLJ`

or `using StatisticalMeasures`

, is here. Alternatively, call `measures()`

(experimental) to generate a dictionary keyed on available measure constructors, with measure metadata as values.

## Custom measures

Any measure-like object with appropriate calling behavior can be used with MLJ. To quickly build custom measures, we recommend using the package StatisticalMeasuresBase.jl, which provides this tutorial. Note, in particular, that an "atomic" measure can be transformed into a multi-target measure using this package.

## Uses of measures

In MLJ, measures are specified:

- when evaluating model performance using
`evaluate!`

/`evaluate`

; see Evaluating Model Performance - when wrapping models using
`TunedModel`

- see Tuning Models - when wrapping iterative models using
`IteratedModel`

- see Controlling Iterative Models - when generating learning curves using
`learning_curve`

- see Learning Curves

and elsewhere.

## Using LossFunctions.jl

In previous versions of MLJ, measures from LossFunctions.jl were also available. Now measures from that package must be explicitly imported and wrapped, as described here.

## Receiver operator characteristics

A related performance evaluation tool provided by StatisticalMeasures.jl, and hence by MLJ, is the `roc_curve`

method:

`StatisticalMeasures.roc_curve`

— Function`roc_curve(ŷ, y) -> false_positive_rates, true_positive_rates, thresholds`

Return data for plotting the receiver operator characteristic (ROC curve) for a binary classification problem.

Here `ŷ`

is a vector of `UnivariateFinite`

distributions (from CategoricalDistributions.jl) over the two values taken by the ground truth observations `y`

, a `CategoricalVector`

.

If there are `k`

unique probabilities, then there are correspondingly `k`

thresholds and `k+1`

"bins" over which the false positive and true positive rates are constant.:

`[0.0 - thresholds[1]]`

`[thresholds[1] - thresholds[2]]`

- ...
`[thresholds[k] - 1]`

Consequently, `true_positive_rates`

and `false_positive_rates`

have length `k+1`

if `thresholds`

has length `k`

.

To plot the curve using your favorite plotting backend, do something like `plot(false_positive_rates, true_positive_rates)`

.

Core algorithm: `Functions.roc_curve`

See also `AreaUnderCurve`

.

## Migration guide for changes to measures in MLJBase 1.0

Prior to MLJBase.jl 1.0 (respectivey, MLJ.jl version 0.19.6) measures were defined in MLJBase.jl (a dependency of MLJ.jl) but now they are provided by MLJ.jl dependency StatisticalMeasures. Effects on users are detailed below:

### Breaking behavior likely relevant to many users

If

`using MLJBase`

without MLJ, then, in Julia 1.9 or higher,`StatisticalMeasures`

must be explicitly imported to use measures that were previously part of MLJBase. If`using MLJ`

, then all previous measures are still available, with the exception of those corresponding to LossFunctions.jl (see below).All measures return a

*single*aggregated measurement. In other words, measures previously reporting a measurement*per-observation*(previously subtyping`Unaggregated`

) no longer do so. To get per-observation measurements, use the new method`StatisticalMeasures.measurements(measure, ŷ, y[, weights, class_weights])`

.The default measure for regression models (used in

`evaluate/evaluate!`

when`measures`

is unspecified) is changed from`rms`

to`l2=LPLoss(2)`

(mean sum of squares).`MeanAbsoluteError`

has been removed and instead`mae`

is an alias for`LPLoss(p=1)`

.Measures that previously skipped

`NaN`

values will now (at least by default) propagate those values. Missing value behavior is unchanged, except some measures that previously did not support`missing`

now do.Aliases for measure

*types*have been removed. For example`RMSE`

(alias for`RootMeanSquaredError`

) is gone. Aliases for instances, such as`rms`

and`cross_entropy`

persist. The exception is`precision`

, for which`ppv`

can be used in its place. (This is to avoid conflict with`Base.precision`

, which was previously pirated.)`info(measure)`

has been decommissioned; query docstrings or access the new measure traits individually instead. These traits are now provided by StatisticalMeasures.jl and not are not exported. For example, to access the orientation of the measure`rms`

, do`import StatisticalMeasures as SM; SM.orientation(rms)`

.Behavior of the

`measures()`

method, to list all measures and associated traits, has changed. It now returns a dictionary instead of a vector of named tuples;`measures(predicate)`

is decommissioned, but`measures(needle)`

is preserved. (This method, owned by StatisticalMeasures.jl, has some other search options, but is experimental.)Measures that were wraps of losses from LossFunctions.jl are no longer exposed by MLJBase or MLJ. To use such a loss, you must explicitly

`import LossFunctions`

and wrap the loss appropriately. See Using losses from LossFunctions.jl for examples.Some user-defined measures working in previous versions of MLJBase.jl may not work without modification, as they must conform to the new StatisticalMeasuresBase.jl API. See this tutorial on how define new measures.

Measures with a "feature argument"

`X`

, as in`some_measure(ŷ, y, X)`

, are no longer supported. See What is a measure? for allowed signatures in measures.

### Packages implementing the MLJ model interface

The migration of measures is not expected to require any changes to the source code in packges providing implementations of the MLJ model interface (MLJModelInterface.jl) such as MLJDecisionTreeInterface.jl and MLJFlux.jl, and this is confirmed by extensive integration tests. However, some current tests will fail, if they use MLJBase measures. The following should generally suffice to adapt such tests:

Add StatisticalMeasures as test dependency, and add

`using StatisticalMeasures`

to your`runtests.jl`

(and/or included submodules).If measures are qualified, as in

`MLJBase.rms`

, then the qualification must be removed or changed to`StatisticalMeasures.rms`

, etc.Be aware that the default measure used in methods such as

`evaluate!`

, when`measure`

is not specified, is changed from`rms`

to`l2`

for regression models.Be aware of that all measures now report a measurement for every observation, and never an aggregate. See second point above.

### Breaking behavior possibly relevant to some developers

The abstract measure types

`Aggregated`

,`Unaggregated`

,`Measure`

have been decommissioned. (A measure is now defined purely by its calling behavior.)What were previously exported as measure types are now only constructors.

`target_scitype(measure)`

is decommissioned. Related is`StatisticalMeasures.observation_scitype(measure)`

which declares an upper bound on the allowed scitype*of a single observation*.`prediction_type(measure)`

is decommissioned. Instead use`StatisticalMeasures.kind_of_proxy(measure)`

.The trait

`reports_each_observation`

is decommissioned. Related is`StatisticalMeasures.can_report_unaggregated`

; if`false`

the new`measurements`

method simply returns`n`

copies of the aggregated measurement, where`n`

is the number of observations provided, instead of individual observation-dependent measurements.`aggregation(measure)`

has been decommissioned. Instead use`StatisticalMeasures.external_mode_of_aggregation(measure)`

.`instances(measure)`

has been decommissioned; query docstrings for measure aliases, or follow this example:`aliases = measures()[RootMeanSquaredError].aliases`

.`is_feature_dependent(measure)`

has been decommissioned. Measures consuming feature data are not longer supported; see above.`distribution_type(measure)`

has been decommissioned.`docstring(measure)`

has been decommissioned.Behavior of

`aggregate`

has changed.The following traits, previously exported by MLJBase and MLJ, cannot be applied to measures:

`supports_weights`

,`supports_class_weights`

,`orientation`

,`human_name`

. Instead use the traits with these names provided by StatisticalMeausures.jl (they will need to be qualified, as in`import StatisticalMeasures; StatisticalMeasures.orientation(measure)`

).