In MLJ loss functions, scoring rules, confusion matrices, sensitivities, etc, are collectively referred to as measures. These measures are provided by the package StatisticalMeasures.jl but are immediately available to the MLJ user. Here's a simple example of direct application of the
log_loss measures to compute a training loss:
using MLJ X, y = @load_iris DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree tree = DecisionTreeClassifier(max_depth=2) mach = machine(tree, X, y) |> fit! yhat = predict(mach, X) log_loss(yhat, y)
For more examples of direct measure usage, see the StatisticalMeasures.jl tutorial.
A list of all measures, ready to use after running
using MLJ or
using StatisticalMeasures, is here. Alternatively, call
measures() (experimental) to generate a dictionary keyed on available measure constructors, with measure metadata as values.
Any measure-like object with appropriate calling behavior can be used with MLJ. To quickly build custom measures, we recommend using the package StatisticalMeasuresBase.jl, which provides this tutorial. Note, in particular, that an "atomic" measure can be transformed into a multi-target measure using this package.
In MLJ, measures are specified:
- when evaluating model performance using
evaluate; see Evaluating Model Performance
- when wrapping models using
TunedModel- see Tuning Models
- when wrapping iterative models using
IteratedModel- see Controlling Iterative Models
- when generating learning curves using
learning_curve- see Learning Curves
In previous versions of MLJ, measures from LossFunctions.jl were also available. Now measures from that package must be explicitly imported and wrapped, as described here.
A related performance evaluation tool provided by StatisticalMeasures.jl, and hence by MLJ, is the
roc_curve(ŷ, y) -> false_positive_rates, true_positive_rates, thresholds
Return data for plotting the receiver operator characteristic (ROC curve) for a binary classification problem.
ŷ is a vector of
UnivariateFinite distributions (from CategoricalDistributions.jl) over the two values taken by the ground truth observations
If there are
k unique probabilities, then there are correspondingly
k thresholds and
k+1 "bins" over which the false positive and true positive rates are constant.:
[0.0 - thresholds]
[thresholds - thresholds]
[thresholds[k] - 1]
false_positive_rates have length
thresholds has length
To plot the curve using your favorite plotting backend, do something like
Prior to MLJBase.jl 1.0 (respectivey, MLJ.jl version 0.19.6) measures were defined in MLJBase.jl (a dependency of MLJ.jl) but now they are provided by MLJ.jl dependency StatisticalMeasures. Effects on users are detailed below:
using MLJBasewithout MLJ, then, in Julia 1.9 or higher,
StatisticalMeasuresmust be explicitly imported to use measures that were previously part of MLJBase. If
using MLJ, then all previous measures are still available, with the exception of those corresponding to LossFunctions.jl (see below).
All measures return a single aggregated measurement. In other words, measures previously reporting a measurement per-observation (previously subtyping
Unaggregated) no longer do so. To get per-observation measurements, use the new method
StatisticalMeasures.measurements(measure, ŷ, y[, weights, class_weights]).
The default measure for regression models (used in
measuresis unspecified) is changed from
l2=LPLoss(2)(mean sum of squares).
MeanAbsoluteErrorhas been removed and instead
maeis an alias for
Measures that previously skipped
NaNvalues will now (at least by default) propagate those values. Missing value behavior is unchanged, except some measures that previously did not support
Aliases for measure types have been removed. For example
RootMeanSquaredError) is gone. Aliases for instances, such as
cross_entropypersist. The exception is
precision, for which
ppvcan be used in its place. (This is to avoid conflict with
Base.precision, which was previously pirated.)
info(measure)has been decommissioned; query docstrings or access the new measure traits individually instead. These traits are now provided by StatisticalMeasures.jl and not are not exported. For example, to access the orientation of the measure
import StatisticalMeasures as SM; SM.orientation(rms).
Behavior of the
measures()method, to list all measures and associated traits, has changed. It now returns a dictionary instead of a vector of named tuples;
measures(predicate)is decommissioned, but
measures(needle)is preserved. (This method, owned by StatisticalMeasures.jl, has some other search options, but is experimental.)
Measures that were wraps of losses from LossFunctions.jl are no longer exposed by MLJBase or MLJ. To use such a loss, you must explicitly
import LossFunctionsand wrap the loss appropriately. See Using losses from LossFunctions.jl for examples.
Some user-defined measures working in previous versions of MLJBase.jl may not work without modification, as they must conform to the new StatisticalMeasuresBase.jl API. See this tutorial on how define new measures.
Measures with a "feature argument"
X, as in
some_measure(ŷ, y, X), are no longer supported. See What is a measure? for allowed signatures in measures.
The migration of measures is not expected to require any changes to the source code in packges providing implementations of the MLJ model interface (MLJModelInterface.jl) such as MLJDecisionTreeInterface.jl and MLJFlux.jl, and this is confirmed by extensive integration tests. However, some current tests will fail, if they use MLJBase measures. The following should generally suffice to adapt such tests:
Add StatisticalMeasures as test dependency, and add
using StatisticalMeasuresto your
runtests.jl(and/or included submodules).
If measures are qualified, as in
MLJBase.rms, then the qualification must be removed or changed to
Be aware that the default measure used in methods such as
measureis not specified, is changed from
l2for regression models.
Be aware of that all measures now report a measurement for every observation, and never an aggregate. See second point above.
The abstract measure types
Measurehave been decommissioned. (A measure is now defined purely by its calling behavior.)
What were previously exported as measure types are now only constructors.
target_scitype(measure)is decommissioned. Related is
StatisticalMeasures.observation_scitype(measure)which declares an upper bound on the allowed scitype of a single observation.
prediction_type(measure)is decommissioned. Instead use
reports_each_observationis decommissioned. Related is
measurementsmethod simply returns
ncopies of the aggregated measurement, where
nis the number of observations provided, instead of individual observation-dependent measurements.
aggregation(measure)has been decommissioned. Instead use
instances(measure)has been decommissioned; query docstrings for measure aliases, or follow this example:
aliases = measures()[RootMeanSquaredError].aliases.
is_feature_dependent(measure)has been decommissioned. Measures consuming feature data are not longer supported; see above.
distribution_type(measure)has been decommissioned.
docstring(measure)has been decommissioned.
The following traits, previously exported by MLJBase and MLJ, cannot be applied to measures:
human_name. Instead use the traits with these names provided by StatisticalMeausures.jl (they will need to be qualified, as in
import StatisticalMeasures; StatisticalMeasures.orientation(measure)).