# Model Stacking

In a model stack, as introduced by Wolpert (1992), an adjucating model learns the best way to combine the predictions of multiple base models. In MLJ, such models are constructed using the `Stack`

constructor. To learn more about stacking and to see how to construct a stack "by hand" using the Learning Networks described later, see this Data Science in Julia tutorial)

`MLJBase.Stack`

— Type`Stack(;metalearner=nothing, resampling=CV(), name1=model1, name2=model2, ...)`

Implements the two-layer generalized stack algorithm introduced by Wolpert (1992) and generalized by Van der Laan et al (2007). Returns an instance of type `ProbablisiticStack`

or `DeterministicStack`

, depending on the prediction type of `metalearner`

.

When training a machine bound to such an instance:

The data is split into training/validation sets according to the specified

`resampling`

strategy.Each base model

`model1`

,`model2`

, ... is trained on each training subset and outputs predictions on the corresponding validation sets. The multi-fold predictions are spliced together into a so-called out-of-sample prediction for each model.The adjudicating model,

`metalearner`

, is subsequently trained on the out-of-sample predictions to learn the best combination of base model predictions.Each base model is retrained on all supplied data for purposes of passing on new production data onto the adjudicator for making new predictions

**Arguments**

`metalearner::Supervised`

: The model that will optimize the desired criterion based on its internals. For instance, a LinearRegression model will optimize the squared error.`resampling`

: The resampling strategy used to prepare out-of-sample predictions of the base learners. It can be a user-defined strategy, the only caveat being that it should have a`nfolds`

attribute.`name1=model1, name2=model2, ...`

: the`Supervised`

model instances to be used as base learners. The provided names become properties of the instance created to allow hyper-parameter access

**Example**

The following code defines a `DeterministicStack`

instance for learning a `Continuous`

target, and demonstrates that:

Base models can be

`Probabilistic`

models even if the stack itself is`Deterministic`

(`predict_mean`

is applied in such cases).As an alternative to hyperparameter optimization, one can stack multiple copies of given model, mutating the hyper-parameter used in each copy.

```
using MLJ
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
EvoTreeRegressor = @load EvoTreeRegressor
XGBoostRegressor = @load XGBoostRegressor
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
X, y = make_regression(500, 5)
stack = Stack(;metalearner=LinearRegressor(),
resampling=CV(),
constant=ConstantRegressor(),
tree_2=DecisionTreeRegressor(max_depth=2),
tree_3=DecisionTreeRegressor(max_depth=3),
evo=EvoTreeRegressor(),
knn=KNNRegressor(),
xgb=XGBoostRegressor())
mach = machine(stack, X, y)
evaluate!(mach; resampling=Holdout(), measure=rmse)
```