# Ensemble models (2)

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save).

## Prelims

This tutorial builds upon the previous ensemble tutorial with a home-made Random Forest regressor on the "boston" dataset.

``````using MLJ
using PyPlot
using PrettyPrinting
using StableRNGs
import DataFrames

sch = schema(X)
p = length(sch.names)
n = sch.nrows
@show (n, p)
DataFrames.describe(y)``````
``````(n, p) = (506, 12)
Summary Stats:
Length:         506
Missing Count:  0
Mean:           22.532806
Minimum:        5.000000
1st Quartile:   17.025000
Median:         21.200000
3rd Quartile:   25.000000
Maximum:        50.000000
Type:           Float64
``````

Let's load the decision tree regressor

``@load DecisionTreeRegressor``
``````DecisionTreeRegressor(
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0) @386``````

Let's first check the performances of just a single Decision Tree Regressor (DTR for short):

``````tree = machine(DecisionTreeRegressor(), X, y)
e = evaluate!(tree, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e |> pprint # use PrettyPrinting``````
``````(measure = [rms, rmslp1],
measurement = [7.054322052311652, 0.32730942581047084],
per_fold = [[7.054322052311652], [0.32730942581047084]],
per_observation = [missing, missing])``````

Note that multiple measures can be reported simultaneously.

## Random forest

Let's create an ensemble of DTR and fix the number of subfeatures to 3 for now.

``````forest = EnsembleModel(atom=DecisionTreeRegressor())
forest.atom.n_subfeatures = 3``````
``3``

(NB: we could have fixed `n_subfeatures` in the DTR constructor too).

To get an idea of how many trees are needed, we can follow the evaluation of the error (say the `rms`) for an increasing number of tree over several sampling round.

``````rng = StableRNG(5123) # for reproducibility
m = machine(forest, X, y)
r = range(forest, :n, lower=10, upper=1000)
curves = learning_curve!(m, resampling=Holdout(fraction_train=0.8, rng=rng),
range=r, measure=rms);``````

let's plot the curves

``````figure(figsize=(8,6))
plot(curves.parameter_values, curves.measurements)
ylabel("Root Mean Squared error", fontsize=16)
xlabel("Number of trees", fontsize=16)
xticks([10, 250, 500, 750, 1000], fontsize=14)
yticks(fontsize=14)`````` The curve is pretty noisy but let's just go for 150 trees:

``forest.n = 150;``

### Tuning

As `forest` is a composite model, it has nested hyperparameters:

``params(forest) |> pprint``
``````(atom = (max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 3,
post_prune = false,
merge_purity_threshold = 1.0),
atomic_weights = [],
bagging_fraction = 0.8,
rng = Random._GLOBAL_RNG(),
n = 150,
acceleration = CPU1{Nothing}(nothing),
out_of_bag_measure = [])``````

Let's define a range for the number of subfeatures and for the bagging fraction:

``````r_sf = range(forest, :(atom.n_subfeatures), lower=1, upper=12)
r_bf = range(forest, :bagging_fraction, lower=0.4, upper=1.0);``````

And build a tuned model as usual that we fit on a 80/20 split. We use a low-resolution grid here to make this tutorial faster but you could of course use a finer grid.

``````tuned_forest = TunedModel(model=forest,
tuning=Grid(resolution=3),
resampling=CV(nfolds=6, rng=StableRNG(32)),
ranges=[r_sf, r_bf],
measure=rms)
m = machine(tuned_forest, X, y)
e = evaluate!(m, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e |> pprint``````
``````(measure = [rms, rmslp1],
measurement = [3.9331681188967704, 0.25013647146434703],
per_fold = [[3.9331681188967704], [0.25013647146434703]],
per_observation = [missing, missing])``````

### Reporting

Again, you could show a 2D heatmap of the hyperparameters

``````r = report(m)

figure(figsize=(8,6))

res = r.plotting

vals_sf = res.parameter_values[:, 1]
vals_bf = res.parameter_values[:, 2]

tricontourf(vals_sf, vals_bf, res.measurements)
xticks(1:3:12, fontsize=12)
xlabel("Number of sub-features", fontsize=14)
yticks(0.4:0.2:1, fontsize=12)
ylabel("Bagging fraction", fontsize=14)`````` Even though we've only done a very rough search, it seems that around 7 sub-features and a bagging fraction of around `0.75` work well.

Now that the machine `m` is trained, you can use use it for predictions (implicitly, this will use the best model). For instance we could look at predictions on the whole dataset:

``````ŷ = predict(m, X)
rms(ŷ, y)``````