# Ensemble models (2)

*Download the*

*notebook*,

*the*

*raw script*,

*or the*

*annotated script*

*for this tutorial (right-click on the link and save).*

## Prelims

This tutorial builds upon the previous ensemble tutorial with a home-made Random Forest regressor on the "boston" dataset.

```
using MLJ
using PyPlot
using PrettyPrinting
using StableRNGs
import DataFrames
X, y = @load_boston
sch = schema(X)
p = length(sch.names)
n = sch.nrows
@show (n, p)
DataFrames.describe(y)
```

```
(n, p) = (506, 12)
Summary Stats:
Length: 506
Missing Count: 0
Mean: 22.532806
Minimum: 5.000000
1st Quartile: 17.025000
Median: 21.200000
3rd Quartile: 25.000000
Maximum: 50.000000
Type: Float64
```

Let's load the decision tree regressor

`@load DecisionTreeRegressor`

```
DecisionTreeRegressor(
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0) @386
```

Let's first check the performances of just a single Decision Tree Regressor (DTR for short):

```
tree = machine(DecisionTreeRegressor(), X, y)
e = evaluate!(tree, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e |> pprint # use PrettyPrinting
```

```
(measure = [rms, rmslp1],
measurement = [7.054322052311652, 0.32730942581047084],
per_fold = [[7.054322052311652], [0.32730942581047084]],
per_observation = [missing, missing])
```

Note that multiple measures can be reported simultaneously.

## Random forest

Let's create an ensemble of DTR and fix the number of subfeatures to 3 for now.

```
forest = EnsembleModel(atom=DecisionTreeRegressor())
forest.atom.n_subfeatures = 3
```

`3`

(**NB**: we could have fixed `n_subfeatures`

in the DTR constructor too).

To get an idea of how many trees are needed, we can follow the evaluation of the error (say the `rms`

) for an increasing number of tree over several sampling round.

```
rng = StableRNG(5123) # for reproducibility
m = machine(forest, X, y)
r = range(forest, :n, lower=10, upper=1000)
curves = learning_curve!(m, resampling=Holdout(fraction_train=0.8, rng=rng),
range=r, measure=rms);
```

let's plot the curves

```
figure(figsize=(8,6))
plot(curves.parameter_values, curves.measurements)
ylabel("Root Mean Squared error", fontsize=16)
xlabel("Number of trees", fontsize=16)
xticks([10, 250, 500, 750, 1000], fontsize=14)
yticks(fontsize=14)
```

The curve is pretty noisy but let's just go for 150 trees:

`forest.n = 150;`

### Tuning

As `forest`

is a composite model, it has nested hyperparameters:

`params(forest) |> pprint`

```
(atom = (max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 3,
post_prune = false,
merge_purity_threshold = 1.0),
atomic_weights = [],
bagging_fraction = 0.8,
rng = Random._GLOBAL_RNG(),
n = 150,
acceleration = CPU1{Nothing}(nothing),
out_of_bag_measure = [])
```

Let's define a range for the number of subfeatures and for the bagging fraction:

```
r_sf = range(forest, :(atom.n_subfeatures), lower=1, upper=12)
r_bf = range(forest, :bagging_fraction, lower=0.4, upper=1.0);
```

And build a tuned model as usual that we fit on a 80/20 split. We use a low-resolution grid here to make this tutorial faster but you could of course use a finer grid.

```
tuned_forest = TunedModel(model=forest,
tuning=Grid(resolution=3),
resampling=CV(nfolds=6, rng=StableRNG(32)),
ranges=[r_sf, r_bf],
measure=rms)
m = machine(tuned_forest, X, y)
e = evaluate!(m, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e |> pprint
```

```
(measure = [rms, rmslp1],
measurement = [3.9331681188967704, 0.25013647146434703],
per_fold = [[3.9331681188967704], [0.25013647146434703]],
per_observation = [missing, missing])
```

### Reporting

Again, you could show a 2D heatmap of the hyperparameters```
r = report(m)
figure(figsize=(8,6))
res = r.plotting
vals_sf = res.parameter_values[:, 1]
vals_bf = res.parameter_values[:, 2]
tricontourf(vals_sf, vals_bf, res.measurements)
xticks(1:3:12, fontsize=12)
xlabel("Number of sub-features", fontsize=14)
yticks(0.4:0.2:1, fontsize=12)
ylabel("Bagging fraction", fontsize=14)
```

Even though we've only done a very rough search, it seems that around 7 sub-features and a bagging fraction of around `0.75`

work well.

Now that the machine `m`

is trained, you can use use it for predictions (implicitly, this will use the best model). For instance we could look at predictions on the whole dataset:

```
ŷ = predict(m, X)
rms(ŷ, y)
```