# Learning Curves

A *learning curve* in MLJ is a plot of some performance estimate, as a function of some model hyperparameter. This can be useful when tuning a single model hyperparameter, or when deciding how many iterations are required for some iterative model. The `learning_curve`

method does not actually generate a plot, but generates the data needed to do so.

To generate learning curves you can bind data to a model by instantiating a machine. You can choose to supply all available data, as performance estimates are computed using a resampling strategy, defaulting to `Holdout(fraction_train=0.7)`

.

```
X, y = @load_boston;
atom = @load RidgeRegressor pkg=MultivariateStats
ensemble = EnsembleModel(atom=atom, n=1000)
mach = machine(ensemble, X, y)
r_lambda = range(ensemble, :(atom.lambda), lower=10, upper=500, scale=:log10)
curve = MLJ.learning_curve(mach;
range=r_lambda,
resampling=CV(nfolds=3),
measure=mav)
using Plots
plot(curve.parameter_values,
curve.measurements,
xlab=curve.parameter_name,
xscale=curve.parameter_scale,
ylab = "CV estimate of RMS error")
```

In the case the `range`

hyperparameter is the number of iterations in some iterative model, `learning_curve`

will not restart the training from scratch for each new value, unless a non-holdout `resampling`

strategy is specified (and provided the model implements an appropriate `update`

method). To obtain multiple curves (that are distinct) you will need to pass the name of the model random number generator, `rng_name`

, and specify the random number generators to be used using `rngs=...`

(an integer automatically generates the number specified):

```
atom.lambda=200
r_n = range(ensemble, :n, lower=1, upper=50)
curves = MLJ.learning_curve(mach;
range=r_n,
verbosity=0,
rng_name=:rng,
rngs=4)
plot(curves.parameter_values,
curves.measurements,
xlab=curves.parameter_name,
ylab="Holdout estimate of RMS error")
```

## API reference

`MLJTuning.learning_curve`

— Function```
curve = learning_curve(mach; resolution=30,
resampling=Holdout(),
repeats=1,
measure=default_measure(machine.model),
rows=nothing,
weights=nothing,
operation=predict,
range=nothing,
acceleration=default_resource(),
acceleration_grid=CPU1(),
rngs=nothing,
rng_name=nothing)
```

Given a supervised machine `mach`

, returns a named tuple of objects suitable for generating a plot of performance estimates, as a function of the single hyperparameter specified in `range`

. The tuple `curve`

has the following keys: `:parameter_name`

, `:parameter_scale`

, `:parameter_values`

, `:measurements`

.

To generate multiple curves for a `model`

with a random number generator (RNG) as a hyperparameter, specify the name, `rng_name`

, of the (possibly nested) RNG field, and a vector `rngs`

of RNG's, one for each curve. Alternatively, set `rngs`

to the number of curves desired, in which case RNG's are automatically generated. The individual curve computations can be distributed across multiple processes using `acceleration=CPUProcesses()`

or `acceleration=CPUThreads()`

. See the second example below for a demonstration.

```
X, y = @load_boston;
atom = @load RidgeRegressor pkg=MultivariateStats
ensemble = EnsembleModel(atom=atom, n=1000)
mach = machine(ensemble, X, y)
r_lambda = range(ensemble, :(atom.lambda), lower=10, upper=500, scale=:log10)
curve = learning_curve(mach; range=r_lambda, resampling=CV(), measure=mav)
using Plots
plot(curve.parameter_values,
curve.measurements,
xlab=curve.parameter_name,
xscale=curve.parameter_scale,
ylab = "CV estimate of RMS error")
```

If using a `Holdout()`

`resampling`

strategy (with no shuffling) and if the specified hyperparameter is the number of iterations in some iterative model (and that model has an appropriately overloaded `MLJModelInterface.update`

method) then training is not restarted from scratch for each increment of the parameter, ie the model is trained progressively.

```
atom.lambda=200
r_n = range(ensemble, :n, lower=1, upper=250)
curves = learning_curve(mach; range=r_n, verbosity=0, rng_name=:rng, rngs=3)
plot!(curves.parameter_values,
curves.measurements,
xlab=curves.parameter_name,
ylab="Holdout estimate of RMS error")
```

```
learning_curve(model::Supervised, X, y; kwargs...)
learning_curve(model::Supervised, X, y, w; kwargs...)
```

Plot a learning curve (or curves) directly, without first constructing a machine.

**Summary of key-word options**

`resolution`

- number of points generated from`range`

(number model evaluations); default is`30`

`resampling`

- resampling strategy; default is`Holdout(fraction_train=0.7)`

`repeats`

- set to more than`1`

for repeated (Monte Carlo) resampling`measure`

- performance measure (metric); automatically inferred from model by default when possible`rows`

- row indices to which resampling should be restricted; default is all rows`weights`

- sample weights used by`measure`

where supported`operation`

- operation, such as`predict`

, to be used in evaluations. If`prediction_type(mach.model) == :probabilistic`

but`prediction_type(measure) == :deterministic`

consider`,`

predict*mode*mode`,`

predict`or`

predict_median`; default is`

predict`.`range`

- object constructed using`range(model, ...)`

or`range(type, ...)`

representing one-dimensional hyper-parameter range.`acceleration`

- parallelization option for passing to`evaluate!`

; an instance of`CPU1`

,`CPUProcesses`

or`CPUThreads`

from the`ComputationalResources.jl`

; default is`default_resource()`

`acceleration_grid`

- parallelization option for distributing each performancde evaluation`rngs`

- for specifying random number generator(s) to be passed to the model (see above)`rng_name`

- name of the model hyper-parameter representing a random number generator (see above); possibly nested