Model Stacking

In a model stack, as introduced by Wolpert (1992), an adjucating model learns the best way to combine the predictions of multiple base models. In MLJ, such models are constructed using the Stack constructor. To learn more about stacking and to see how to construct a stack "by hand" using the Learning Networks described later, see this Data Science in Julia tutorial)

Stack(;metalearner=nothing, resampling=CV(), name1=model1, name2=model2, ...)

Implements the two-layer generalized stack algorithm introduced by Wolpert (1992) and generalized by Van der Laan et al (2007). Returns an instance of type ProbablisiticStack or DeterministicStack, depending on the prediction type of metalearner.

When training a machine bound to such an instance:

  • The data is split into training/validation sets according to the specified resampling strategy.

  • Each base model model1, model2, ... is trained on each training subset and outputs predictions on the corresponding validation sets. The multi-fold predictions are spliced together into a so-called out-of-sample prediction for each model.

  • The adjudicating model, metalearner, is subsequently trained on the out-of-sample predictions to learn the best combination of base model predictions.

  • Each base model is retrained on all supplied data for purposes of passing on new production data onto the adjudicator for making new predictions


  • metalearner::Supervised: The model that will optimize the desired criterion based on its internals. For instance, a LinearRegression model will optimize the squared error.

  • resampling: The resampling strategy used to prepare out-of-sample predictions of the base learners. It can be a user-defined strategy, the only caveat being that it should have a nfolds attribute.

  • name1=model1, name2=model2, ...: the Supervised model instances to be used as base learners. The provided names become properties of the instance created to allow hyper-parameter access


The following code defines a DeterministicStack instance for learning a Continuous target, and demonstrates that:

  • Base models can be Probabilistic models even if the stack itself is Deterministic (predict_mean is applied in such cases).

  • As an alternative to hyperparameter optimization, one can stack multiple copies of given model, mutating the hyper-parameter used in each copy.

using MLJ

DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
EvoTreeRegressor = @load EvoTreeRegressor
XGBoostRegressor = @load XGBoostRegressor
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels

X, y = make_regression(500, 5)

stack = Stack(;metalearner=LinearRegressor(),

mach = machine(stack, X, y)
evaluate!(mach; resampling=Holdout(), measure=rmse)