In a model stack, as introduced by Wolpert (1992), an adjucating model learns the best way to combine the predictions of multiple base models. In MLJ, such models are constructed using the
Stack constructor. To learn more about stacking and to see how to construct a stack "by hand" using the Learning Networks described later, see this Data Science in Julia tutorial)
Stack(;metalearner=nothing, resampling=CV(), name1=model1, name2=model2, ...)
Implements the two-layer generalized stack algorithm introduced by Wolpert (1992) and generalized by Van der Laan et al (2007). Returns an instance of type
DeterministicStack, depending on the prediction type of
When training a machine bound to such an instance:
The data is split into training/validation sets according to the specified
Each base model
model2, ... is trained on each training subset and outputs predictions on the corresponding validation sets. The multi-fold predictions are spliced together into a so-called out-of-sample prediction for each model.
The adjudicating model,
metalearner, is subsequently trained on the out-of-sample predictions to learn the best combination of base model predictions.
Each base model is retrained on all supplied data for purposes of passing on new production data onto the adjudicator for making new predictions
metalearner::Supervised: The model that will optimize the desired criterion based on its internals. For instance, a LinearRegression model will optimize the squared error.
resampling: The resampling strategy used to prepare out-of-sample predictions of the base learners. It can be a user-defined strategy, the only caveat being that it should have a
name1=model1, name2=model2, ...: the
Supervisedmodel instances to be used as base learners. The provided names become properties of the instance created to allow hyper-parameter access
The following code defines a
DeterministicStack instance for learning a
Continuous target, and demonstrates that:
Base models can be
Probabilisticmodels even if the stack itself is
predict_meanis applied in such cases).
As an alternative to hyperparameter optimization, one can stack multiple copies of given model, mutating the hyper-parameter used in each copy.
using MLJ DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree EvoTreeRegressor = @load EvoTreeRegressor XGBoostRegressor = @load XGBoostRegressor KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels LinearRegressor = @load LinearRegressor pkg=MLJLinearModels X, y = make_regression(500, 5) stack = Stack(;metalearner=LinearRegressor(), resampling=CV(), constant=ConstantRegressor(), tree_2=DecisionTreeRegressor(max_depth=2), tree_3=DecisionTreeRegressor(max_depth=3), evo=EvoTreeRegressor(), knn=KNNRegressor(), xgb=XGBoostRegressor()) mach = machine(stack, X, y) evaluate!(mach; resampling=Holdout(), measure=rmse)