Linear Pipelines

In MLJ a pipeline is a composite model in which models are chained together in a linear (non-branching) chain. Pipelines can include learned or static target transformations, if one of the models is supervised.

To illustrate basic construction of a pipeline, consider the following toy data:

using MLJ
X = (age    = [23, 45, 34, 25, 67],
	 gender = categorical(['m', 'm', 'f', 'm', 'f']));
height = [67.0, 81.5, 55.6, 90.0, 61.1]

The code below defines a new model type, and an instance of that ype called pipe, for performing the following operations:

  • standardize the target variable :height to have mean zero and standard deviation one
  • coerce the :age field to have Continuous scitype
  • one-hot encode the categorical feature :gender
  • train a K-nearest neighbor model on the transformed inputs and transformed target
  • restore the predictions of the KNN model to the original :height scale (i.e., invert the standardization)
KNNRegressor = @load KNNRegressor
pipe = @pipeline(X -> coerce(X, :age=>Continuous),
				 OneHotEncoder,
				 KNNRegressor(K=3),
				 target = Standardizer())

Pipeline326(
	one_hot_encoder = OneHotEncoder(
			features = Symbol[],
			drop_last = false,
			ordered_factor = true,
			ignore = false),
	knn_regressor = KNNRegressor(
			K = 3,
			algorithm = :kdtree,
			metric = Distances.Euclidean(0.0),
			leafsize = 10,
			reorder = true,
			weights = :uniform),
	target = Standardizer(
			features = Symbol[],
			ignore = false,
			ordered_factor = false,
			count = false)) @287

Notice that field names for the composite are automatically generated based on the component model type names. The automatically generated name of the new model composite model type, Pipeline406, can be replaced with a user-defined one by specifying, say, name=MyPipe. If you are planning on serializing (saving) a pipeline-machine, you will need to specify a name..

The new model can be used just like any other non-composite model:

pipe.knn_regressor.K = 2
pipe.one_hot_encoder.drop_last = true
evaluate(pipe, X, height, resampling=Holdout(), measure=l2, verbosity=2)

[ Info: Training Machine{Pipeline406} @959.
[ Info: Training Machine{UnivariateStandardizer} @422.
[ Info: Training Machine{OneHotEncoder} @745.
[ Info: Spawning 1 sub-features to one-hot encode feature :gender.
[ Info: Training Machine{KNNRegressor} @005.
┌───────────┬───────────────┬────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├───────────┼───────────────┼────────────┤
│ l2        │ 55.5          │ [55.5]     │
└───────────┴───────────────┴────────────┘
_.per_observation = [[[55.502499999999934]]]

For important details on including target transformations, see below.

MLJBase.@pipelineMacro
@pipeline model1 model2 ... modelk

Create an instance of an automatically generated composite model type, in which the specified models are composed in order. This means model1 receives inputs, whose output is passed to model2, and so forth. Model types or instances may be specified.

Important. By default a new model type name is automatically generated. To specify a different name add a keyword argument such as name=MyPipeType. This is necessary if serializing the pipeline; see MLJ.save.

At most one of the models may be a supervised model, but this model can appear in any position.

The @pipeline macro accepts several key-word arguments discussed further below.

Static (unlearned) transformations - that is, ordinary functions - may also be inserted in the pipeline as shown in the following example:

@pipeline X->coerce(X, :age=>Continuous) OneHotEncoder ConstantClassifier

Target transformation and inverse transformation

A learned target transformation (such as standardization) can also be specified, using the key-word target, provided the transformer provides an inverse_transform method:

@pipeline OneHotEncoder KNNRegressor target=UnivariateTransformer

A static transformation can be specified instead, but then an inverse must also be given:

@pipeline(OneHotEncoder, KNNRegressor,
          target = v -> log.(v),
          inverse = v -> exp.(v))

Important. By default, the target inversion is applied immediately following the (unique) supervised model in the pipeline. To apply at the end of the pipeline, specify invert_last=true.

Optional key-word arguments

  • target=... - any Unsupervised model or Function

  • inverse=... - any Function (unspecified if target is Unsupervised)

  • invert_last - set to true to delay target inversion to end of pipeline (default=true)

  • prediction_type - prediction type of the pipeline; possible values: :deterministic, :probabilistic, :interval (default=:deterministic if not inferable)

  • operation - operation applied to the supervised component model, when present; possible values: predict, predict_mean, predict_median, predict_mode (default=predict)

  • name - new composite model type name; can be any name not already in current global namespace (autogenerated by default(

See also: @from_network