Ensemble models - extended tutorial

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save).

Simple example of a homogeneous ensemble using learning networks

In this simple example, no bagging is used, so every atomic model gets the same learned parameters, unless the atomic model training algorithm has randomness, eg, DecisionTree with random subsampling of features at nodes.

Note that MLJ has a built in model wrapper called EnsembleModel for creating bagged ensembles with a few lines of code.

Definition of composite model type

using MLJ
using PyPlot
import Statistics

learning network (composite model spec):

Xs = source()
ys = source()

atom = @load DecisionTreeRegressor
atom.n_subfeatures = 4 # to ensure diversity among trained atomic models

machines = (machine(atom, Xs, ys) for i in 1:100)
Base.Generator{UnitRange{Int64},Main.FD_SANDBOX_15866893453267974565.var"#56#57"}(Main.FD_SANDBOX_15866893453267974565.var"#56#57"(), 1:100)

overload mean for nodes:

Statistics.mean(v...) = mean(v)
Statistics.mean(v::AbstractVector{<:AbstractNode}) = node(mean, v...)

yhat = mean([predict(m, Xs) for  m in machines]);

new composite model type and instance:

surrogate = Deterministic()
mach = machine(surrogate, Xs, ys; predict=yhat)

@from_network mach begin
    mutable struct OneHundredModels
        atom=atom
    end
end

one_hundred_models = OneHundredModels()
OneHundredModels(
    atom = DecisionTreeRegressor(
            max_depth = -1,
            min_samples_leaf = 5,
            min_samples_split = 2,
            min_purity_increase = 0.0,
            n_subfeatures = 4,
            post_prune = false,
            merge_purity_threshold = 1.0)) @500

Application to data

X, y = @load_boston;

tune regularization parameter for a single tree:

r = range(atom,
          :min_samples_split,
          lower=2,
          upper=100, scale=:log)

mach = machine(atom, X, y)

curve = learning_curve!(mach,
                        range=r,
                        measure=mav,
                        resampling=CV(nfolds=9),
                        verbosity=0)

plot(curve.parameter_values, curve.measurements)
xlabel(curve.parameter_name)

tune regularization parameter for all trees in ensemble simultaneously:

r = range(one_hundred_models,
          :(atom.min_samples_split),
          lower=2,
          upper=100, scale=:log)

mach = machine(one_hundred_models, X, y)

curve = learning_curve!(mach,
                        range=r,
                        measure=mav,
                        resampling=CV(nfolds=9),
                        verbosity=0)

plot(curve.parameter_values, curve.measurements)
xlabel(curve.parameter_name)