# Composing Models

MLJ has a flexible interface for composing multiple machine learning elements to form a *learning network*, whose complexity can extend beyond the "pipelines" of other machine learning toolboxes. While these learning networks can be applied directly to learning tasks, they are more commonly used to specify new re-usable, stand-alone, composite model types, that behave like any other model type. The main novelty of composite models is that they include other models as hyper-parameters.

MLJ also provides dedicated syntax for the most common composition use-cases, which are described first below.

A description of the general framework begins at Learning Networks. For an in-depth high-level description of learning networks, refer to the following article:

## Linear Pipelines

In MLJ a *pipeline* is a composite model in which models are chained together in a linear (non-branching) chain. Pipelines can include learned or static target transformations, if one of the models is supervised.

To illustrate basic construction of a pipeline, consider the following toy data:

```
using MLJ
X = (age = [23, 45, 34, 25, 67],
gender = categorical(['m', 'm', 'f', 'm', 'f']));
height = [67.0, 81.5, 55.6, 90.0, 61.1]
```

The code below defines a new model type, and an *instance* of that type called `pipe`

, for performing the following operations:

- standardize the target variable
`:height`

to have mean zero and standard deviation one - coerce the
`:age`

field to have`Continuous`

scitype - one-hot encode the categorical feature
`:gender`

- train a K-nearest neighbor model on the transformed inputs and transformed target
- restore the predictions of the KNN model to the original
`:height`

scale (i.e., invert the standardization)

```
KNNRegressor = @load KNNRegressor
pipe = @pipeline(X -> coerce(X, :age=>Continuous),
OneHotEncoder,
KNNRegressor(K=3),
target = Standardizer())
Pipeline326(
one_hot_encoder = OneHotEncoder(
features = Symbol[],
drop_last = false,
ordered_factor = true,
ignore = false),
knn_regressor = KNNRegressor(
K = 3,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = :uniform),
target = Standardizer(
features = Symbol[],
ignore = false,
ordered_factor = false,
count = false)) @287
```

Notice that field names for the composite are automatically generated based on the component model type names. The automatically generated name of the new model composite model type, `Pipeline406`

, can be replaced with a user-defined one by specifying, say, `name=MyPipe`

. **If you are planning on serializing (saving) a pipeline-machine, you will need to specify a name.**.

The new model can be used just like any other non-composite model:

```
pipe.knn_regressor.K = 2
pipe.one_hot_encoder.drop_last = true
evaluate(pipe, X, height, resampling=Holdout(), measure=l2, verbosity=2)
[ Info: Training Machine{Pipeline406} @959.
[ Info: Training Machine{UnivariateStandardizer} @422.
[ Info: Training Machine{OneHotEncoder} @745.
[ Info: Spawning 1 sub-features to one-hot encode feature :gender.
[ Info: Training Machine{KNNRegressor} @005.
┌───────────┬───────────────┬────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├───────────┼───────────────┼────────────┤
│ l2 │ 55.5 │ [55.5] │
└───────────┴───────────────┴────────────┘
_.per_observation = [[[55.502499999999934]]]
```

For important details on including target transformations, see below.

`MLJBase.@pipeline`

— Macro`@pipeline model1 model2 ... modelk`

Create an instance of an automatically generated composite model type, in which the specified models are composed in order. This means `model1`

receives inputs, whose output is passed to `model2`

, and so forth. Model types or instances may be specified.

**Important.** By default a new model *type* name is automatically generated. To specify a different name add a keyword argument such as `name=MyPipeType`

. This is necessary if serializing the pipeline; see `MLJ.save`

.

At most one of the models may be a supervised model, but this model can appear in any position.

The `@pipeline`

macro accepts several key-word arguments discussed further below.

Static (unlearned) transformations - that is, ordinary functions - may also be inserted in the pipeline as shown in the following example:

`@pipeline X->coerce(X, :age=>Continuous) OneHotEncoder ConstantClassifier`

**Target transformation and inverse transformation**

A learned target transformation (such as standardization) can also be specified, using the key-word `target`

, provided the transformer provides an `inverse_transform`

method:

`@pipeline OneHotEncoder KNNRegressor target=UnivariateTransformer`

A static transformation can be specified instead, but then an `inverse`

must also be given:

```
@pipeline(OneHotEncoder, KNNRegressor,
target = v -> log.(v),
inverse = v -> exp.(v))
```

*Important.* By default, the target inversion is applied *immediately following* the (unique) supervised model in the pipeline. To apply at the end of the pipeline, specify `invert_last=true`

.

**Optional key-word arguments**

`target=...`

- any`Unsupervised`

model or`Function`

`inverse=...`

- any`Function`

(unspecified if`target`

is`Unsupervised`

)`invert_last`

- set to`true`

to delay target inversion to end of pipeline (default=`true`

)`prediction_type`

- prediction type of the pipeline; possible values:`:deterministic`

,`:probabilistic`

,`:interval`

(default=`:deterministic`

if not inferable)`operation`

- operation applied to the supervised component model, when present; possible values:`predict`

,`predict_mean`

,`predict_median`

,`predict_mode`

(default=`predict`

)`name`

- new composite model type name; can be any name not already in current global namespace (autogenerated by default(

See also: `@from_network`

## Homogeneous Ensembles

Although an ensemble of models sharing a common set of hyperparameters can defined using the learning network API, MLJ's `EnsembleModel`

model wrapper is preferred, for convenience and best performance.

`MLJEnsembles.EnsembleModel`

— Function```
EnsembleModel(atom=nothing,
atomic_weights=Float64[],
bagging_fraction=0.8,
n=100,
rng=GLOBAL_RNG,
acceleration=default_resource(),
out_of_bag_measure=[])
```

Create a model for training an ensemble of `n`

learners, with optional bagging, each with associated model `atom`

. Ensembling is useful if `fit!(machine(atom, data...))`

does not create identical models on repeated calls (ie, is a stochastic model, such as a decision tree with randomized node selection criteria), or if `bagging_fraction`

is set to a value less than 1.0, or both. The constructor fails if no `atom`

is specified.

Only atomic models supporting targets with scitype `AbstractVector{<:Finite}`

(univariate classifiers) or `AbstractVector{<:Continuous}`

(univariate regressors) are supported.

If `rng`

is an integer, then `MersenneTwister(rng)`

is the random number generator used for bagging. Otherwise some `AbstractRNG`

object is expected.

The atomic predictions are weighted according to the vector `atomic_weights`

(to allow for external optimization) except in the case that `atom`

is a `Deterministic`

classifier. Uniform atomic weights are used if `weight`

has zero length.

The ensemble model is `Deterministic`

or `Probabilistic`

, according to the corresponding supertype of `atom`

. In the case of deterministic classifiers (`target_scitype(atom) <: Abstract{<:Finite}`

), the predictions are majority votes, and for regressors (`target_scitype(atom)<: AbstractVector{<:Continuous}`

) they are ordinary averages. Probabilistic predictions are obtained by averaging the atomic probability distribution/mass functions; in particular, for regressors, the ensemble prediction on each input pattern has the type `MixtureModel{VF,VS,D}`

from the Distributions.jl package, where `D`

is the type of predicted distribution for `atom`

.

The `acceleration`

keyword argument is used to specify the compute resource (a subtype of `ComputationalResources.AbstractResource`

) that will be used to accelerate/parallelize ensemble fitting.

If a single measure or non-empty vector of measures is specified by `out_of_bag_measure`

, then out-of-bag estimates of performance are written to the trainig report (call `report`

on the trained machine wrapping the ensemble model).

*Important:* If sample weights `w`

(as opposed to atomic weights) are specified when constructing a machine for the ensemble model, as in `mach = machine(ensemble_model, X, y, w)`

, then `w`

is used by any measures specified in `out_of_bag_measure`

that support sample weights.

## Model Stacking

In a model stack, as introduced by Wolpert (1992), an adjucating model learns the best way to combine the predictions of multiple base models. In MLJ, such models are constructed using the `Stack`

constructor. To learn more about stacking and to see how to construct a stack "by hand" using the Learning Networks described later, see this Data Science in Julia tutorial)

`MLJBase.Stack`

— Type`Stack(;metalearner=nothing, resampling=CV(), name1=model1, name2=model2, ...)`

Implements the two-layer generalized stack algorithm introduced by Wolpert (1992) and generalized by Van der Laan et al (2007). Returns an instance of type `ProbablisiticStack`

or `DeterministicStack`

, depending on the prediction type of `metalearner`

.

When training a machine bound to such an instance:

The data is split into training/validation sets according to the specified

`resampling`

strategy.Each base model

`model1`

,`model2`

, ... is trained on each training subset and outputs predictions on the corresponding validation sets. The multi-fold predictions are spliced together into a so-called out-of-sample prediction for each model.The adjudicating model,

`metalearner`

, is subsequently trained on the out-of-sample predictions to learn the best combination of base model predictions.Each base model is retrained on all supplied data for purposes of passing on new production data onto the adjudicator for making new predictions

**Arguments**

`metalearner::Supervised`

: The model that will optimize the desired criterion based on its internals. For instance, a LinearRegression model will optimize the squared error.`resampling::Union{CV, StratifiedCV}`

: The resampling strategy used to prepare out-of-sample predictions of the base learners`name1=model1, name2=model2, ...`

: the`Supervised`

model instances to be used as base learners. The provided names become properties of the instance created to allow hyper-parameter access

**Example**

The following code defines a `DeterministicStack`

instance for learning a `Continuous`

target, and demonstrates that:

Base models can be

`Probabilistic`

models even if the stack itself is`Deterministic`

(`predict_mean`

is applied in such cases).As an alternative to hyperparameter optimization, one can stack multiple copies of given model, mutating the hyper-parameter used in each copy.

```
using MLJ
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
EvoTreeRegressor = @load EvoTreeRegressor
XGBoostRegressor = @load XGBoostRegressor
KNNRegressor = @load KNNRegressor pkg=NearestNeighborModels
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
X, y = make_regression(500, 5)
stack = Stack(;metalearner=LinearRegressor(),
resampling=CV(),
constant=ConstantRegressor(),
tree_2=DecisionTreeRegressor(max_depth=2),
tree_3=DecisionTreeRegressor(max_depth=3),
evo=EvoTreeRegressor(),
knn=KNNRegressor(),
xgb=XGBoostRegressor())
mach = machine(stack, X, y)
evaluate!(mach; resampling=Holdout(), measure=rmse)
```

## Learning Networks

Below is a practical guide to the MLJ implementantion of learning networks, which have been described more abstractly in the article:

Hand-crafting a learning network, as outlined below, is a relatively advanced MLJ feature, assuming familiarity with the basics outlined in Getting Started. The syntax for building a learning network is essentially an extension of the basic syntax but with data containers replaced with nodes of a graph.

It is important to distinguish between *learning networks* and the comosite MLJ model types they are used to define.

A *learning network* is a directed acyclic graph whose nodes are objects that can be called to obtained data, either for training a machine, or for using as input to an *operation*. An operation is either:

*static*, that is, an ordinary function, such as such as`+`

,`log`

or`vcat`

, or*dynamic*, that is, an operation such as`predict`

or`transform`

which is dispatched on both data*and*a training outcome attached to some machine.

Since the result of calling a node depends on the outcome of training events (and may involve lazy evaluation) one may think of a node as "dynamic" data, as opposed to the "static" data appearing in an ordinary MLJ workflow.

Different operations can dispatch on the same machine (i.e., can access a common set of learned parameters) and different machines can point to the same model (allowing for hyperparameter coupling).

By contrast, an *exported learning network* is a learning network exported as a stand-alone, re-usable `Model`

object, to which all the MLJ `Model`

meta-algorithms can be applied (ensembling, systematic tuning, etc).

By specifying data at the source nodes of a learning network, one can use and test the learning network as it is defined, which is also a good way to understand how learning networks work under the hood. This data, if specified, is ignored in the export process, for the exported composite model, like any other model, is not associated with any data until wrapped in a machine.

In MLJ learning networks treat the flow of information during training and prediction/transforming separately.

### Building a simple learning network

The diagram below depicts a learning network which standardizes the input data `X`

, learns an optimal Box-Cox transformation for the target `y`

, predicts new target values using ridge regression, and then inverse-transforms those predictions to restore them to the original scale. Here we have only dynamic operations, labelled blue; the machines are in green. Notice that two operations both use `stand`

, which stores the learned standardization scale parameters. The lower "Training" panel indicates which nodes are used to train each machine, and what model each machine is associated with.

Looking ahead, we note that the new composite model type we will create later will be assigned a single hyperparameter `regressor`

, and the learning network model `RidgeRegressor(lambda=0.1)`

will become this parameter's default value. Since model hyperparameters are mutable, this regressor can be changed to a different one (e.g., `HuberRegressor()`

).

For testing purposes, we'll use a small synthetic data set:

```
using Statistics
import DataFrames
x1 = rand(300)
x2 = rand(300)
x3 = rand(300)
y = exp.(x1 - x2 -2x3 + 0.1*rand(300))
X = DataFrames.DataFrame(x1=x1, x2=x2, x3=x3)
```

([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 231, 232, 233, 234, 235, 236, 237, 238, 239, 240], [241, 242, 243, 244, 245, 246, 247, 248, 249, 250 … 291, 292, 293, 294, 295, 296, 297, 298, 299, 300])

Step one is to wrap the data in *source nodes*:

```
Xs = source(X)
ys = source(y)
```

Source @830 ⏎ `AbstractVector{Continuous}`

*Note.* One can omit the specification of data at the source nodes (by writing instead `Xs = source()`

and `ys = source()`

) and still export the resulting network as a stand-alone model using the @from_network macro described later; see the example under Static operations on nodes. However, one will be unable to fit or call network nodes, as illustrated below.

The contents of a source node can be recovered by simply calling the node with no arguments:

`ys()[1:2]`

2-element Vector{Float64}: 0.3002059198037532 0.1655203561741318

We label the nodes that we will define according to their outputs in the diagram. Notice that the nodes `z`

and `yhat`

use the same machine, namely `box`

, for different operations.

To construct the `W`

node we first need to define the machine `stand`

that it will use to transform inputs.

```
stand_model = Standardizer()
stand = machine(stand_model, Xs)
```

Machine{Standardizer,…} @609 trained 0 times; caches data args: 1: Source @322 ⏎ `Table{AbstractVector{Continuous}}`

Because `Xs`

is a node, instead of concrete data, we can call `transform`

on the machine without first training it, and the result is the new node `W`

, instead of concrete transformed data:

`W = transform(stand, Xs)`

Node{Machine{Standardizer,…}} @681 args: 1: Source @322 formula: transform( Machine{Standardizer,…} @609, Source @322)

To get actual transformed data we *call* the node appropriately, which will require we first train the node. Training a node, rather than a machine, triggers training of *all* necessary machines in the network.

```
fit!(W, rows=train)
W() # transform all data
W(rows=test ) # transform only test data
W(X[3:4,:]) # transform any data, new or old
```

2 rows × 3 columns

x1 | x2 | x3 | |
---|---|---|---|

Float64 | Float64 | Float64 | |

1 | 0.914002 | 1.6645 | 1.55165 |

2 | -1.29646 | -0.180671 | -1.0314 |

If you like, you can think of `W`

(and the other nodes we will define) as "dynamic data": `W`

is *data*, in the sense that it an be called ("indexed") on rows, but *dynamic*, in the sense the result depends on the outcome of training events.

The other nodes of our network are defined similarly:

```
RidgeRegressor = @load RidgeRegressor pkg=MultivariateStats
box_model = UnivariateBoxCoxTransformer() # for making data look normally-distributed
box = machine(box_model, ys)
z = transform(box, ys)
ridge_model = RidgeRegressor(lambda=0.1)
ridge =machine(ridge_model, W, z)
zhat = predict(ridge, W)
yhat = inverse_transform(box, zhat);
```

Node{Machine{UnivariateBoxCoxTransformer,…}} @678 args: 1: Node{Machine{RidgeRegressor,…}} @845 formula: inverse_transform( Machine{UnivariateBoxCoxTransformer,…} @732, predict( Machine{RidgeRegressor,…} @722, transform( Machine{Standardizer,…} @609, Source @322)))

We are ready to train and evaluate the completed network. Notice that the standardizer, `stand`

, is *not* retrained, as MLJ remembers that it was trained earlier:

```
fit!(yhat, rows=train);
rms(y[test], yhat(rows=test)) # evaluate
```

0.02026949613447241

We can change a hyperparameters and retrain:

```
ridge_model.lambda = 0.01
fit!(yhat, rows=train);
```

Node{Machine{UnivariateBoxCoxTransformer,…}} @678 args: 1: Node{Machine{RidgeRegressor,…}} @845 formula: inverse_transform( Machine{UnivariateBoxCoxTransformer,…} @732, predict( Machine{RidgeRegressor,…} @722, transform( Machine{Standardizer,…} @609, Source @322)))

And re-evaluate:

`rms(y[test], yhat(rows=test))`

0.02018233290784475

Notable feature.The machine,`ridge::Machine{RidgeRegressor}`

, is retrained, because its underlying model has been mutated. However, since the outcome of this training has no effect on the training inputs of the machines`stand`

and`box`

, these transformers are left untouched. (During construction, each node and machine in a learning network determines and records all machines on which it depends.) This behavior, which extends to exported learning networks, means we can tune our wrapped regressor (using a holdout set) without re-computing transformations each time the hyperparameter is changed.

### Learning network machines

As we show next, a learning network needs to be exported to create a new stand-alone model type. Instances of that type can be bound with data in a machine, which can then be evaluated, for example. Somewhat paradoxically, one can wrap a learning network in a certain kind of machine, called a *learning network machine*, *before* exporting it, and in fact, the export process actually requires us to do so. Since a composite model type does not yet exist, one constructs the machine using a "surrogate" model, whose name indicates the ultimate model supertype (`Deterministic`

, `Probabilistic`

, `Unsupervised`

or `Static`

). This surrogate model has no fields.

Continuing with the example above:

```
surrogate = Deterministic()
mach = machine(surrogate, Xs, ys; predict=yhat);
```

Machine{DeterministicSurrogate,…} @297 trained 0 times; does not cache data args: 1: Source @322 ⏎ `Table{AbstractVector{Continuous}}` 2: Source @830 ⏎ `AbstractVector{Continuous}`

Notice that a key-word argument declares which node is for making predictions, and the arguments `Xs`

and `ys`

declare which source nodes receive the input and target data. With `mach`

constructed in this way, the code

```
fit!(mach)
predict(mach, X[test,:]);
```

[ Info: Training Machine{UnivariateBoxCoxTransformer,…} @732. [ Info: Training Machine{Standardizer,…} @609. [ Info: Training Machine{RidgeRegressor,…} @722.

is equivalent to

```
fit!(yhat)
yhat(X[test,:]);
```

[ Info: Not retraining Machine{UnivariateBoxCoxTransformer,…} @732. Use `force=true` to force. [ Info: Not retraining Machine{Standardizer,…} @609. Use `force=true` to force. [ Info: Not retraining Machine{RidgeRegressor,…} @722. Use `force=true` to force.

While it's main purpose is for export (see below), this machine can actually be evaluated:

`evaluate!(mach, resampling=CV(nfolds=3), measure=LPLoss(p=2))`

┌────────────────────┬───────────────┬───────────────────────────────┐ │ _.measure │ _.measurement │ _.per_fold │ ├────────────────────┼───────────────┼───────────────────────────────┤ │ LPLoss{Int64} @587 │ 0.000314 │ [0.000276, 0.000315, 0.00035] │ └────────────────────┴───────────────┴───────────────────────────────┘ _.per_observation = [[[0.000321, 5.86e-7, ..., 1.42e-6], [1.94e-6, 3.03e-5, ..., 0.000727], [2.62e-7, 1.31e-5, ..., 2.05e-5]]] _.fitted_params_per_fold = [ … ] _.report_per_fold = [ … ] _.train_test_rows = [ … ]

For more on constructing learning network machines, see `machine`

.

## Exporting a learning network as a stand-alone model

Having satisfied that our learning network works on the synthetic data, we are ready to export it as a stand-alone model.

### Method I: The @from_network macro

Having defined a learning network machine, `mach`

, as above, the following code defines a new model subtype `WrappedRegressor <: Supervised`

with a single field `regressor`

:

```
@from_network mach begin
mutable struct WrappedRegressor
regressor=ridge_model
end
end
```

Note the declaration of the default value `ridge_model`

, *which must refer to an actual model appearing in the learning network*. It can be typed, as in the alternative declaration below, which also declares some traits for the type (as shown by `info(WrappedRegressor)`

; see also Trait declarations).

```
@from_network mach begin
mutable struct WrappedRegressor
regressor::Deterministic=ridge_model
end
input_scitype = Table(Continuous,Finite)
target_scitype = AbstractVector{<:Continuous}
end
```

We can now create an instance of this type and apply the meta-algorithms that apply to any MLJ model:

```
julia> composite = WrappedRegressor()
WrappedRegressor(
regressor = RidgeRegressor(
lambda = 0.01))
X, y = @load_boston;
evaluate(composite, X, y, resampling=CV(), measure=l2, verbosity=0)
```

Since our new type is mutable, we can swap the `RidgeRegressor`

out for any other regressor:

```
KNNRegressor = @load KNNRegressor
composite.regressor = KNNRegressor(K=7)
julia> composite
WrappedRegressor(regressor = KNNRegressor(K = 7,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = :uniform,),) @ 2…63
```

### Method II: Finer control (advanced)

This section describes an advanced feature that can be skipped on a first reading.

In Method I above, only models appearing in the network will appear as hyperparameters of the exported composite model. There is a second more flexible method for exporting the network, which allows finer control over the exported `Model`

struct, and which also avoids macros. The two steps required are:

Define a new

`mutable struct`

model type.Wrap the learning network code in a model

`fit`

method.

Let's start with an elementary illustration in the learning network we just exported using Method I.

The `mutable struct`

definition looks like this:

```
mutable struct WrappedRegressor2 <: DeterministicComposite
regressor
end
# keyword constructor
WrappedRegressor2(; regressor=RidgeRegressor()) = WrappedRegressor2(regressor)
```

The other supertype options are `ProbabilisticComposite`

, `IntervalComposite`

, `UnsupervisedComposite`

and `StaticComposite`

.

We now simply cut and paste the code defining the learning network into a model `fit`

method (as opposed to a machine `fit!`

method):

```
function MLJ.fit(model::WrappedRegressor2, verbosity::Integer, X, y)
Xs = source(X)
ys = source(y)
stand_model = Standardizer()
stand = machine(stand_model, Xs)
W = transform(stand, Xs)
box_model = UnivariateBoxCoxTransformer()
box = machine(box_model, ys)
z = transform(box, ys)
ridge_model = model.regressor ###
ridge =machine(ridge_model, W, z)
zhat = predict(ridge, W)
yhat = inverse_transform(box, zhat)
mach = machine(Deterministic(), Xs, ys; predict=yhat)
return!(mach, model, verbosity)
end
```

This completes the export process.

Notes:

The line marked

`###`

, where the new exported model's hyperparameter`regressor`

is spliced into the network, is the only modification to the previous code.After defining the network there is the additional step of constructing and fitting a learning network machine (see above).

The last call in the function

`return!(mach, model, verbosity)`

calls`fit!`

on the learning network machine`mach`

and splits it into various pieces, as required by the MLJ model interface. See also the`return!`

doc-string.**Important note**An MLJ`fit`

method is not allowed to mutate its`model`

argument.

What's going on here?MLJ's machine interface is built atop a more primitivemodelinterface, implemented for each algorithm. Each supervised model type (eg,`RidgeRegressor`

) requires model`fit`

and`predict`

methods, which are called by the correspondingmachine`fit!`

and`predict`

methods. We don't need to define a model`predict`

method here because MLJ provides a fallback which simply calls the`predict`

on the learning network machine created in the`fit`

method.

#### A composite model coupling component model hyper-parameters

We now give a more complicated example of a composite model which exposes some parameters used in the network that are not simply component models. The model combines a clustering model (e.g., `KMeans()`

) for dimension reduction with ridge regression, but has the following "coupling" of the hyper parameters: The ridge regularization depends on the number of clusters used (with less regularization for a greater number of clusters) and a user-specified "coupling" coefficient `K`

.

```
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels
mutable struct MyComposite <: DeterministicComposite
clusterer # the clustering model (e.g., KMeans())
ridge_solver # a ridge regression parameter we want to expose
K::Float64 # a "coupling" coefficient
end
function MLJ.fit(composite::Composite, verbosity, X, y)
Xs = source(X)
ys = source(y)
clusterer = composite.clusterer
k = clusterer.k
clustererM = machine(clusterer, Xs)
Xsmall = transform(clustererM, Xs)
# the coupling: ridge regularization depends on number of
# clusters (and the specified coefficient `K`):
lambda = exp(-composite.K/clusterer.k)
ridge = RidgeRegressor(lambda=lambda, solver=composite.ridge_solver)
ridgeM = machine(ridge, Xsmall, ys)
yhat = predict(ridgeM, Xsmall)
mach = machine(Deterministic(), Xs, ys; predict=yhat)
return!(mach, composite, verbosity)
end
kmeans = (@load KMeans pkg=Clustering)()
my_composite = MyComposite(kmeans, nothing, 0.5)
```

MyComposite( clusterer = KMeans( k = 3, metric = Distances.SqEuclidean(0.0)), ridge_solver = nothing, K = 0.5) @559

`evaluate(my_composite, X, y, measure=MeanAbsoluteError(), verbosity=0)`

┌────────────────────────┬───────────────┬────────────────────────────────────── │ _.measure │ _.measurement │ _.per_fold ⋯ ├────────────────────────┼───────────────┼────────────────────────────────────── │ MeanAbsoluteError @181 │ 0.182 │ [0.135, 0.147, 0.191, 0.211, 0.232, ⋯ └────────────────────────┴───────────────┴────────────────────────────────────── 1 column omitted _.per_observation = [missing] _.fitted_params_per_fold = [ … ] _.report_per_fold = [ … ] _.train_test_rows = [ … ]

## Static operations on nodes

Continuing to view nodes as "dynamic data", we can, in addition to applying "dynamic" operations like `predict`

and `transform`

to nodes, overload ordinary "static" (unlearned) operations as well. These operations can be ordinary functions (with possibly multiple arguments) or they could be functions *with parameters*, such as "take a weighted average of two nodes", where the weights are parameters. Here we address the simpler case of ordinary functions. For the parametric case, see "Static transformers" in Transformers and other unsupervised models

Let us first give a demonstration of operations that work out-of-the-box. These include:

addition and scalar multiplication

`exp`

,`log`

,`vcat`

,`hcat`

tabularization (

`MLJ.table`

) and matrixification (`MLJ.matrix`

)

As a demonstration of some of these, consider the learning network below that: (i) One-hot encodes the input table `X`

; (ii) Log transforms the continuous target `y`

; (iii) Fits specified K-nearest neighbour and ridge regressor models to the data; (iv) Computes an average of the individual model predictions; and (v) Inverse transforms (exponentiates) the blended predictions.

Note, in particular, the lines defining `zhat`

and `yhat`

, which combine several static node operations.

```
RidgeRegressor = @load RidgeRegressor pkg=MultivariateStats
KNNRegressor = @load KNNRegressor
Xs = source()
ys = source()
hot = machine(OneHotEncoder(), Xs)
# W, z, zhat and yhat are nodes in the network:
W = transform(hot, Xs) # one-hot encode the input
z = log(ys) # transform the target
model1 = RidgeRegressor(lambda=0.1)
model2 = KNNRegressor(K=7)
mach1 = machine(model1, W, z)
mach2 = machine(model2, W, z)
# average the predictions of the KNN and ridge models:
zhat = 0.5*predict(mach1, W) + 0.5*predict(mach2, W)
# inverse the target transformation
yhat = exp(zhat)
```

Node{Nothing} @839 args: 1: Node{Nothing} @057 formula: #127( +( #129( predict( Machine{RidgeRegressor,…} @361, transform( Machine{OneHotEncoder,…} @306, Source @241))), #129( predict( Machine{KNNRegressor,…} @650, transform( Machine{OneHotEncoder,…} @306, Source @241)))))

Exporting this learning network as a stand-alone model:

```
@from_network machine(Deterministic(), Xs, ys; predict=yhat) begin
mutable struct DoubleRegressor
regressor1=model1
regressor2=model2
end
end
```

To deal with operations on nodes not supported out-of-the box, one can use the `@node`

macro. Supposing, in the preceding example, we wanted the geometric mean rather than arithmetic mean. Then, the definition of `zhat`

above can be replaced with

```
yhat1 = predict(mach1, W)
yhat2 = predict(mach2, W)
gmean(y1, y2) = sqrt.(y1.*y2)
zhat = @node gmean(yhat1, yhat2)
```

There is also a `node`

function, which would achieve the same in this way:

`zhat = node((y1, y2)->sqrt.(y1.*y2), predict(mach1, W), predict(mach2, W))`

### More `node`

examples

Here are some examples taken from MLJ source (at work in the example above) for overloading common operations for nodes:

```
Base.log(v::Vector{<:Number}) = log.(v)
Base.log(X::AbstractNode) = node(log, X)
import Base.+
+(y1::AbstractNode, y2::AbstractNode) = node(+, y1, y2)
+(y1, y2::AbstractNode) = node(+, y1, y2)
+(y1::AbstractNode, y2) = node(+, y1, y2)
```

Here `AbstractNode`

is the common super-type of `Node`

and `Source`

.

And a final example, using the `@node`

macro to row-shuffle a table:

```
using Random
X = (x1 = [1, 2, 3, 4, 5],
x2 = [:one, :two, :three, :four, :five])
rows(X) = 1:nrows(X)
Xs = source(X)
rs = @node rows(Xs)
W = @node selectrows(Xs, @node shuffle(rs))
julia> W()
(x1 = [5, 1, 3, 2, 4],
x2 = Symbol[:five, :one, :three, :two, :four],)
```

## The learning network API

Two new julia types are part of learning networks: `Source`

and `Node`

.

Formally, a learning network defines *two* labeled directed acyclic graphs (DAG's) whose nodes are `Node`

or `Source`

objects, and whose labels are `Machine`

objects. We obtain the first DAG from directed edges of the form $N1 -> N2$ whenever $N1$ is an *argument* of $N2$ (see below). Only this DAG is relevant when calling a node, as discussed in examples above and below. To form the second DAG (relevant when calling or calling `fit!`

on a node) one adds edges for which $N1$ is *training argument* of the the machine which labels $N1$. We call the second, larger DAG, the *completed learning network* (but note only edges of the smaller network are explicitly drawn in diagrams, for simplicity).

### Source nodes

Only source nodes reference concrete data. A `Source`

object has a single field, `data`

.

`MLJBase.source`

— Method`Xs = source(X=nothing)`

Define, a learning network `Source`

object, wrapping some input data `X`

, which can be `nothing`

for purposes of exporting the network as stand-alone model. For training and testing the unexported network, appropriate vectors, tables, or other data containers are expected.

The calling behaviour of a `Source`

object is this:

```
Xs() = X
Xs(rows=r) = selectrows(X, r) # eg, X[r,:] for a DataFrame
Xs(Xnew) = Xnew
```

`MLJBase.rebind!`

— Function`rebind!(s, X)`

Attach new data `X`

to an existing source node `s`

. Not a public method.

`MLJBase.sources`

— Function`MLJBase.origins`

— Function`origins(N)`

Return a list of all origins of a node `N`

accessed by a call `N()`

. These are the source nodes of ancestor graph of `N`

if edges corresponding to training arguments are excluded. A `Node`

object cannot be called on new data unless it has a unique origin.

Not to be confused with `sources(N)`

which refers to the same graph but without the training edge deletions.

### Nodes

The key components of a `Node`

are:

An

*operation*, which will either be*static*(a fixed function) or*dynamic*(such as`predict`

or`transform`

, dispatched on a machine).A

*machine*on which to dispatch the operation (void if the operation is static). The training arguments of the machine are generally other nodes.Upstream connections to other nodes (including source nodes) specified by

*arguments*(one for each argument of the operation).

`MLJBase.node`

— Type`N = node(f::Function, args...)`

Defines a `Node`

object `N`

wrapping a static operation `f`

and arguments `args`

. Each of the `n`

elements of `args`

must be a `Node`

or `Source`

object. The node `N`

has the following calling behaviour:

```
N() = f(args[1](), args[2](), ..., args[n]())
N(rows=r) = f(args[1](rows=r), args[2](rows=r), ..., args[n](rows=r))
N(X) = f(args[1](X), args[2](X), ..., args[n](X))
J = node(f, mach::Machine, args...)
```

Defines a dynamic `Node`

object `J`

wrapping a dynamic operation `f`

(`predict`

, `predict_mean`

, `transform`

, etc), a nodal machine `mach`

and arguments `args`

. Its calling behaviour, which depends on the outcome of training `mach`

(and, implicitly, on training outcomes affecting its arguments) is this:

```
J() = f(mach, args[1](), args[2](), ..., args[n]())
J(rows=r) = f(mach, args[1](rows=r), args[2](rows=r), ..., args[n](rows=r))
J(X) = f(mach, args[1](X), args[2](X), ..., args[n](X))
```

Generally `n=1`

or `n=2`

in this latter case.

```
predict(mach, X::AbsractNode, y::AbstractNode)
predict_mean(mach, X::AbstractNode, y::AbstractNode)
predict_median(mach, X::AbstractNode, y::AbstractNode)
predict_mode(mach, X::AbstractNode, y::AbstractNode)
transform(mach, X::AbstractNode)
inverse_transform(mach, X::AbstractNode)
```

Shortcuts for `J = node(predict, mach, X, y)`

, etc.

Calling a node is a recursive operation which terminates in the call to a source node (or nodes). Calling nodes on *new* data `X`

fails unless the number of such nodes is one.

`MLJBase.@node`

— Macro`@node f(...)`

Construct a new node that applies the function `f`

to some combination of nodes, sources and other arguments.

*Important.* An argument not in global scope is assumed to be a node or source.

**Examples**

```
X = source(π)
W = @node sin(X)
julia> W()
0
X = source(1:10)
Y = @node selectrows(X, 3:4)
julia> Y()
3:4
julia> Y(["one", "two", "three", "four"])
2-element Array{Symbol,1}:
"three"
"four"
X1 = source(4)
X2 = source(5)
add(a, b, c) = a + b + c
N = @node add(X1, 1, X2)
julia> N()
10
```

See also `node`

`MLJBase.@from_network`

— Macro```
@from_network mach [mutable] struct NewCompositeModel
...
end
```

or

```
@from_network mach begin
[mutable] struct NewCompositeModel
...
end
<optional trait declarations>
end
```

Create a new stand-alone model type called `NewCompositeModel`

, using the specified learning network machine `mach`

as a blueprint.

For more on learning network machines, see `machine`

.

**Example**

Consider the following simple learning network for training a decision tree after one-hot encoding the inputs, and forcing the predictions to be point-predictions (rather than probabilistic):

```
Xs = source()
ys = source()
hot = OneHotEncoder()
tree = DecisionTreeClassifier()
W = transform(machine(hot, Xs), Xs)
yhat = predict_mode(machine(tree, W, ys), W)
```

A learning network machine is defined by

`mach = machine(Deterministic(), Xs, ys; predict=yhat)`

To specify a new `Deterministic`

composite model type `WrappedTree`

we specify the model instances appearing in the network as "default" values in the following decorated struct definition:

```
@from_network mach struct WrappedTree
encoder=hot
decision_tree=tree
end
```

and create a new instance with `WrappedTree()`

.

To allow the second model component to be replaced by any other probabilistic model we instead make a mutable struct declaration and, if desired, annotate types appropriately. In the following code illustration some model trait declarations have also been added:

```
@from_network mach begin
mutable struct WrappedTree
encoder::OneHotEncoder=hot
classifier::Probabilistic=tree
end
input_scitype = Table(Continuous, Finite)
is_pure_julia = true
end
```

`MLJBase.return!`

— Function`return!(mach::Machine{<:Surrogate}, model, verbosity)`

The last call in custom code defining the `MLJBase.fit`

method for a new composite model type. Here `model`

is the instance of the new type appearing in the `MLJBase.fit`

signature, while `mach`

is a learning network machine constructed using `model`

. Not relevant when defining composite models using `@pipeline`

or `@from_network`

.

For usage, see the example given below. Specificlly, the call does the following:

Determines which hyper-parameters of

`model`

point to model instances in the learning network wrapped by`mach`

, for recording in an object called`cache`

, for passing onto the MLJ logic that handles smart updating (namely, an`MLJBase.update`

fallback for composite models).Calls

`fit!(mach, verbosity=verbosity)`

.Moves any data in source nodes of the learning network into

`cache`

(for data-anonymization purposes).Records a copy of

`model`

in`cache`

.Returns

`cache`

and outcomes of training in an appropriate form (specifically,`(mach.fitresult, cache, mach.report)`

; see Adding Models for General Use for technical details.)

**Example**

The following code defines, "by hand", a new model type `MyComposite`

for composing standardization (whitening) with a deterministic regressor:

```
mutable struct MyComposite <: DeterministicComposite
regressor
end
function MLJBase.fit(model::MyComposite, verbosity, X, y)
Xs = source(X)
ys = source(y)
mach1 = machine(Standardizer(), Xs)
Xwhite = transform(mach1, Xs)
mach2 = machine(model.regressor, Xwhite, ys)
yhat = predict(mach2, Xwhite)
mach = machine(Deterministic(), Xs, ys; predict=yhat)
return!(mach, model, verbosity)
end
```