Boston with Flux

Download the notebook, the raw script, or the annotated script for this tutorial (right-click on the link and save). Main author: Ayush Shridhar (ayush-1506).

Getting started

import MLJFlux
import MLJ
import DataFrames
import Statistics
import Flux
using Random
using PyPlot

Random.seed!(11)
MersenneTwister(UInt32[0x0000000b]) @ 1002

Loading the Boston dataset. Our aim will be to implement a neural network regressor to predict the price of a house, given a number of features.

features, targets = MLJ.@load_boston
features = DataFrames.DataFrame(features)
@show size(features)
@show targets[1:3]
first(features, 3) |> MLJ.pretty
size(features) = (506, 12)
targets[1:3] = [24.0, 21.6, 34.7]
┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
│ Crim       │ Zn         │ Indus      │ NOx        │ Rm         │ Age        │ Dis        │ Rad        │ Tax        │ PTRatio    │ Black      │ LStat      │
│ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │ Float64    │
│ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │ Continuous │
├────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ 0.00632    │ 18.0       │ 2.31       │ 0.538      │ 6.575      │ 65.2       │ 4.09       │ 1.0        │ 296.0      │ 15.3       │ 396.9      │ 4.98       │
│ 0.02731    │ 0.0        │ 7.07       │ 0.469      │ 6.421      │ 78.9       │ 4.9671     │ 2.0        │ 242.0      │ 17.8       │ 396.9      │ 9.14       │
│ 0.02729    │ 0.0        │ 7.07       │ 0.469      │ 7.185      │ 61.1       │ 4.9671     │ 2.0        │ 242.0      │ 17.8       │ 392.83     │ 4.03       │
└────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘

Next obvious steps: partitioning into train and test set

train, test = MLJ.partition(MLJ.eachindex(targets), 0.70, rng=52)


Let us try to implement an Neural Network regressor using Flux.jl. MLJFlux.jl provides an MLJ interface to the Flux.jl deep learning framework. The package provides four essential models: NeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifier and ImageClassifier.

At the heart of these models is a neural network. This is specified using the builder parameter. Creating a builder object consists of two steps: Step 1: Creating a new struct inherited from MLJFlux.Builder. MLJFlux.Builder is an abstract structure used for the purpose of dispatching. Suppose we define a new struct called MyNetworkBuilder. This can contain any attribute required to build the model later. (Step 2). Let's use Dense Neural Network with 2 hidden layers.

mutable struct MyNetworkBuilder <: MLJFlux.Builder
    n1::Int #Number of cells in the first hidden layer
    n2::Int #Number of cells in the second hidden layer
end

Step 2: Building the neural network from this object. Extend the MLJFlux.build function. This takes in 3 arguments: The object of MyNetworkBuilder, input dimension (ip) and output dimension (op).

function MLJFlux.build(model::MyNetworkBuilder, input_dims, output_dims)
    layer1 = Flux.Dense(input_dims, model.n1)
    layer2 = Flux.Dense(model.n1, model.n2)
    layer3 = Flux.Dense(model.n2, output_dims)
    return Flux.Chain(layer1, layer2, layer3)
end

With all definitions ready, let us create an object of this:

myregressor = MyNetworkBuilder(20, 10)
MyNetworkBuilder @788

Since the boston dataset is a regression problem, we'll be using NeuralNetworkRegressor here. One thing to remember is that a NeuralNetworkRegressor object works seamlessly like any other MLJ model: you can wrap it in an MLJ machine and do anything you'd do otherwise.

Let's start by defining our NeuralNetworkRegressor object, that takes myregressor as it's parameter.

nnregressor = MLJFlux.NeuralNetworkRegressor(builder=myregressor, epochs=10)
NeuralNetworkRegressor(
    builder = MyNetworkBuilder(
            n1 = 20,
            n2 = 10),
    optimiser = Flux.Optimise.ADAM(0.001, (0.9, 0.999), IdDict{Any,Any}()),
    loss = Flux.mse,
    epochs = 10,
    batch_size = 1,
    lambda = 0.0,
    alpha = 0.0,
    optimiser_changes_trigger_retraining = false) @546

Other parameters that NeuralNetworkRegressor takes can be found here: https://github.com/alan-turing-institute/MLJFlux.jl#model-hyperparameters

nnregressor now acts like any other MLJ model. Let's try wrapping it in a MLJ machine and calling fit!, predict.

mach = MLJ.machine(nnregressor, features, targets)
Machine{NeuralNetworkRegressor{MyNetworkBuilder,…}} @593 trained 0 times.
  args: 
    1:	Source @496 ⏎ `Table{AbstractArray{Continuous,1}}`
    2:	Source @992 ⏎ `AbstractArray{Continuous,1}`

Let's fit this on the train set

MLJ.fit!(mach, rows=train, verbosity=3)
Machine{NeuralNetworkRegressor{MyNetworkBuilder,…}} @593 trained 1 time.
  args: 
    1:	Source @496 ⏎ `Table{AbstractArray{Continuous,1}}`
    2:	Source @992 ⏎ `AbstractArray{Continuous,1}`

As we can see, the training loss decreases at each epoch, showing the the neural network is gradually learning form the training set.

preds = MLJ.predict(mach, features[test, :])

print(preds[1:5])
Float32[29.322287, 26.417507, 24.125174, -2.5468833, 20.77854]

Now let's retrain our model. One thing to remember is that retrainig may OR may not re-initialize our neural network model parameters. For example, changing the number of epochs to 15 will not causes the model to train to 15 epcohs, but just 5 additional epochs.

nnregressor.epochs = 15

MLJ.fit!(mach, rows=train, verbosity=3)
Machine{NeuralNetworkRegressor{MyNetworkBuilder,…}} @593 trained 2 times.
  args: 
    1:	Source @496 ⏎ `Table{AbstractArray{Continuous,1}}`
    2:	Source @992 ⏎ `AbstractArray{Continuous,1}`

You can always specify that you want to retrain the model from scratch using the force=true parameter. (Look at documentation for fit! for more).

However, changing parameters such as batch_size will necessarily cause re-training from scratch.

nnregressor.batch_size = 2
MLJ.fit!(mach, rows=train, verbosity=3)
Machine{NeuralNetworkRegressor{MyNetworkBuilder,…}} @593 trained 3 times.
  args: 
    1:	Source @496 ⏎ `Table{AbstractArray{Continuous,1}}`
    2:	Source @992 ⏎ `AbstractArray{Continuous,1}`

Another bit to remember here is that changing the optimiser doesn't cause retaining by default. However, the optimiser_changes_trigger_retraining in NeuralNetworkRegressor can be toggled to accomodate this. This allows one to modify the learning rate, for example, after an initial burn-in period.

# Inspecting out-of-sample loss as a function of epochs

r = MLJ.range(nnregressor, :epochs, lower=1, upper=30, scale=:log10)
curve = MLJ.learning_curve(nnregressor, features, targets,
                       range=r,
                       resampling=MLJ.Holdout(fraction_train=0.7),
                       measure=MLJ.l2)

figure(figsize=(8,6))

plt.plot(curve.parameter_values,
    curve.measurements)

yscale("log")
xlabel(curve.parameter_name)
ylabel("l2")
UndefVarError: savefig not defined
BostonFlux1

Tuning

As mentioned above, nnregressor can act like any other MLJ model. Let's try to tune the batch_size parameter.

bs = MLJ.range(nnregressor, :batch_size, lower=1, upper=5)

tm = MLJ.TunedModel(model=nnregressor, ranges=[bs, ], measure=MLJ.l2)
DeterministicTunedModel(
    model = NeuralNetworkRegressor(
            builder = MyNetworkBuilder @788,
            optimiser = Flux.Optimise.ADAM(0.001, (0.9, 0.999), IdDict{Any,Any}()),
            loss = Flux.mse,
            epochs = 15,
            batch_size = 2,
            lambda = 0.0,
            alpha = 0.0,
            optimiser_changes_trigger_retraining = false),
    tuning = Grid(
            goal = nothing,
            resolution = 10,
            shuffle = true,
            rng = Random._GLOBAL_RNG()),
    resampling = Holdout(
            fraction_train = 0.7,
            shuffle = false,
            rng = Random._GLOBAL_RNG()),
    measure = l2(),
    weights = nothing,
    operation = MLJModelInterface.predict,
    range = MLJBase.NumericRange{Int64,MLJBase.Bounded,Symbol}[NumericRange{Int64,…} @426],
    train_best = true,
    repeats = 1,
    n = nothing,
    acceleration = CPU1{Nothing}(nothing),
    acceleration_resampling = CPU1{Nothing}(nothing),
    check_measure = true) @281

For more on tuning, refer to the model-tuning tutorial.

m = MLJ.machine(tm, features, targets)

MLJ.fit!(m)
Machine{DeterministicTunedModel{Grid,…}} @972 trained 1 time.
  args: 
    1:	Source @386 ⏎ `Table{AbstractArray{Continuous,1}}`
    2:	Source @367 ⏎ `AbstractArray{Continuous,1}`

This evaluated the model at each value of our range. The best value is:

MLJ.fitted_params(m).best_model.batch_size
2