Learning networks

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save).

Preliminary steps

Let's generate a DataFrame with some dummy regression data, let's also load the good old ridge regressor.

using MLJ, StableRNGs
import DataFrames
@load RidgeRegressor pkg=MultivariateStats

rng = StableRNG(551234) # for reproducibility

x1 = rand(rng, 300)
x2 = rand(rng, 300)
x3 = rand(rng, 300)
y = exp.(x1 - x2 -2x3 + 0.1*rand(rng, 300))

X = DataFrames.DataFrame(x1=x1, x2=x2, x3=x3)
first(X, 3) |> pretty
┌────────────────────┬─────────────────────┬─────────────────────┐
│ x1                 │ x2                  │ x3                  │
│ Float64            │ Float64             │ Float64             │
│ Continuous         │ Continuous          │ Continuous          │
├────────────────────┼─────────────────────┼─────────────────────┤
│ 0.9840017609992084 │ 0.7714818111684167  │ 0.23209935449185903 │
│ 0.8917954915748527 │ 0.7473993120336746  │ 0.7709140827147394  │
│ 0.8063948246988288 │ 0.01827506280083635 │ 0.07216450827912091 │
└────────────────────┴─────────────────────┴─────────────────────┘

Let's also prepare the train and test split which will be useful later on.

test, train = partition(eachindex(y), 0.8);

Defining a learning network

In MLJ, a learning network is a directed acyclic graph (DAG) whose nodes apply trained or untrained operations such as a predict or transform (trained) or +, vcat etc. (untrained). Learning networks can be seen as pipelines on steroids.

Let's consider the following simple DAG:

Operation DAG

It corresponds to a fairly standard regression workflow: the data is standardized, the target is transformed using a Box-Cox transformation, a ridge regression is applied and the result is converted back by inverting the transform.

Note: actually this DAG is simple enough that it could also have been done with a pipeline.

Sources and nodes

In MLJ a learning network starts at source nodes and flows through nodes (X and y) defining operations/transformations (W, z, , ). To define the source nodes, use the source function, you should specify whether it's a target:

Xs = source(X)
ys = source(y)
Source @772 ⏎ `AbstractArray{Continuous,1}`

To define an "trained-operation" node, you must simply create a machine wrapping a model and another node (the data) and indicate which operation should be performed (e.g. transform):

stand = machine(Standardizer(), Xs)
W = transform(stand, Xs)
Node{Machine{Standardizer}} @240
  args:
    1:	Source @483
  formula:
    transform(
        Machine{Standardizer} @820, 
        Source @483)

You can fit! a trained-operation node at any point, MLJ will fit whatever it needs that is upstream of that node. In this case, there is just a source node upstream of W so fitting W will just fit the standardizer:

fit!(W, rows=train);

If you want to get the transformed data, you can then call the node speciying on which part of the data the operation should be performed:

W()             # transforms all data
W(rows=test, )  # transforms only test data
W(X[3:4, :])    # transforms specific data
2×3 DataFrame
│ Row │ x1       │ x2       │ x3        │
│     │ Float64  │ Float64  │ Float64   │
├─────┼──────────┼──────────┼───────────┤
│ 1   │ 0.856967 │ -1.59115 │ -1.48215  │
│ 2   │ -1.06436 │ -1.5056  │ -0.234452 │

Let's now define the other nodes:

box_model = UnivariateBoxCoxTransformer()
box = machine(box_model, ys)
z = transform(box, ys)

ridge_model = RidgeRegressor(lambda=0.1)
ridge = machine(ridge_model, W, z)
ẑ = predict(ridge, W)

ŷ = inverse_transform(box, ẑ)
Node{Machine{UnivariateBoxCoxTransformer}} @232
  args:
    1:	Node{Machine{RidgeRegressor}} @615
  formula:
    inverse_transform(
        Machine{UnivariateBoxCoxTransformer} @763, 
        predict(
            Machine{RidgeRegressor} @400, 
            transform(
                Machine{Standardizer} @820, 
                Source @483)))

Note that we have not yet done any training, but if we now call fit! on , it will fit all nodes upstream of that need to be re-trained:

fit!(ŷ, rows=train);

Now that has been fitted, you can apply the full graph on test data (or any compatible data). For instance, let's get the rms between the ground truth and the predicted values:

rms(y[test], ŷ(rows=test))
0.03360496363407853

Modifying hyperparameters

Hyperparameters can be accessed using the dot syntax as usual. Let's modify the regularisation parameter of the ridge regression:

ridge_model.lambda = 5.0;

Since the node corresponds to a machine that wraps ridge_model, that node has effectively changed and will be retrained:

fit!(ŷ, rows=train)
rms(y[test], ŷ(rows=test))
0.038342725973612

"Arrow" syntax

Important: for this to work, you need to be using Julia ≥ 1.3:

The syntax to define nodes etc. is a bit verbose. MLJ supports a shorter syntax which abstracts away some of the steps. We will refer to it as the "arrow" syntax as it makes use of the |> operator which can be interpreted as "data flow".

Let's start with W and z (the "first layer"):

W = X |> Standardizer()
z = y |> UnivariateBoxCoxTransformer()
Node{Machine{UnivariateBoxCoxTransformer}} @959
  args:
    1:	Source @547
  formula:
    transform(
        Machine{UnivariateBoxCoxTransformer} @477, 
        Source @547)

Note that we feed X and y directly into models. In the background, MLJ will create source nodes and assumes that the operation is a transform given the models are unsupervised.

For a node that corresponds to a supervised model, you can feed a tuple where the first element corresponds to the input (here W) and the second corresponds to the target (here z), MLJ will assume the operation is a predict:

ẑ = (W, z) |> RidgeRegressor(lambda=0.1);

Finally we need to apply the inverse of the transform encapsulated in the node z, for this:

ŷ = ẑ |> inverse_transform(z);

That's it! You can now fit the network as before:

fit!(ŷ, rows=train)
rms(y[test], ŷ(rows=test))
0.03360496363407853

To manually modify hyperparameters on a node, you can access them like so:

ẑ[:lambda] = 5.0;

Here remember that is a node with a machine that wraps around a ridge regression with a parameter lambda so the syntax above is equivalent to

ẑ.machine.model.lambda = 5.0;

which is relevant if you want to tune the hyperparameter using a TunedModel.

fit!(ŷ, rows=train)
rms(y[test], ŷ(rows=test))
0.038342725973612