TuringBench Workflow: Example #1

Algorithm: Support Vector Classification of MNIST digit images with scikit-learn

Benchmarks: Compare training time, prediction time and performance of the classifier on the local machine vs a Docker container on the local machine and compare different versions of the classifier

Getting started with TuringBench

This benchmarking notebook follows the TuringBench workflow outlined at https://alan-turing-institute.github.io/data-science-benchmarking/

Install Docker on each computing platform for which you wish to carry out benchmarking
Familiarise yourself with the basics of how Dockerfile instructions work
Create an account on Docker Hub
If you wish to use automated builds, ensure your software is maintained with a GitHub repository.

Code for this example is maintained at https://github.com/edwardchalstrey1/scikit-learn-classifier

The SVC code to use as a benchmarking example

For the purpose of this example, consider the model generated by scikit-learn with svm.SVC in the code below to be the algorithm that we wish to benchmark.

The code below trains the SVC model with half the data from the scikit-learn’s MNIST image dataset and predicts with the other half. The training time, prediction time and performance are recorded as the benchmarks that we wish to collect.

Using the magic writefile command, I save the contents of the cell as a file, so we can use the exact same code in the Docker version later on.

%%writefile classifier.py
from sklearn import datasets, svm, metrics
import time
import statistics as st

def create_model():

    return svm.SVC(gamma=0.01)

def benchmark_model(model, repeats=10):

    digits = datasets.load_digits()

    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1)) # MNIST images

    expected = digits.target[n_samples // 2:]

    output = {"Training time (s)": [], "Prediction time (s)": [],
    "Performance (micro avg f1 score)": []}

    tt, pt, p = [], [], []

    for i in range(0, repeats):

        # Train the classifier model
        start = time.time()
        model.fit(data[:n_samples // 2], digits.target[:n_samples // 2])
        end = time.time()
        tt.append(end - start)

        # Use the model for prediction
        start = time.time()
        predicted = model.predict(data[n_samples // 2:])
        end = time.time()
        pt.append(end - start)

        p.append(metrics.classification_report(expected, predicted, output_dict=True)['micro avg']['f1-score'])

    # Get median benchmarks for chosen number of repeats

    benchmarks = {
        "Training time (s)": st.median(tt),
        "Prediction time (s)": st.median(pt),
        "Performance (micro avg f1 score)": st.median(p)
    }

    return benchmarks

Overwriting classifier.py

How does the classifier perform on my laptop? Let’s see:

import classifier
model = classifier.create_model()
local_results = classifier.benchmark_model(model, repeats=10)
print(local_results)

{'Training time (s)': 0.1223764419555664, 'Prediction time (s)': 0.0564265251159668, 'Performance (micro avg f1 score)': 0.6974416017797553}

Benchmark script

Let’s now write a script that will print the benchmarks. This script will be used as the command in our Docker container as we’ll see.

%%writefile benchmarks.py
import classifier
model = classifier.create_model()
print(classifier.benchmark_model(model, repeats=10))

Overwriting benchmarks.py

Build a Docker image for our code and push to Docker Hub

1) Create a docker image that installs/imports your software and runs the benchmark script you have created.

2) Build a docker container and tag a version

3) Push to Docker Hub. This allows you to then pull the container to any machine you wish to benchmark on

The Dockerfile

Where possible, use a base image that has some of the requirements for your software already installed. In this example, we use the Python 3 base image and install the relevant packages with pip. Installing this simple classifier code only requires copying the classifier module we saved earlier into the container. We run the benchmarking script as the CMD, so that the benchmarks will be printed when the container is run.

%%writefile Dockerfile
FROM python:3

RUN apt-get update
RUN pip3 install numpy
RUN pip3 install scipy
RUN pip3 install scikit-learn

COPY classifier.py /classifier.py
COPY benchmarks.py /benchmarks.py

CMD python3 benchmarks.py

Overwriting Dockerfile

Build

I’ve tagged this container as 1.0

%%bash
docker build -t edwardchalstrey/classifier:1.0 .

Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> Using cache
 ---> 189c71e77002
Step 7/8 : COPY benchmarks.py /benchmarks.py
 ---> a29ee6090d78
Step 8/8 : CMD python3 benchmarks.py
 ---> Running in 9b30e659cd06
Removing intermediate container 9b30e659cd06
 ---> 3fa98e268fac
Successfully built 3fa98e268fac
Successfully tagged edwardchalstrey/classifier:1.0

Push

%%bash
docker push edwardchalstrey/classifier:1.0

The push refers to repository [docker.io/edwardchalstrey/classifier]
dccb7ed6962a: Preparing
578571cb3a28: Preparing
f12d5b3c5c82: Preparing
337d3babfd9c: Preparing
493622b04a5f: Preparing
65ef2276d16f: Preparing
4b381ae03f9a: Preparing
08a5b66845ac: Preparing
88a85bcf8170: Preparing
65860ac81ef4: Preparing
a22a5ac18042: Preparing
6257fa9f9597: Preparing
578414b395b9: Preparing
abc3250a6c7f: Preparing
13d5529fd232: Preparing
65ef2276d16f: Waiting
4b381ae03f9a: Waiting
08a5b66845ac: Waiting
88a85bcf8170: Waiting
65860ac81ef4: Waiting
6257fa9f9597: Waiting
578414b395b9: Waiting
abc3250a6c7f: Waiting
13d5529fd232: Waiting
a22a5ac18042: Waiting
493622b04a5f: Layer already exists
337d3babfd9c: Layer already exists
578571cb3a28: Layer already exists
f12d5b3c5c82: Layer already exists
65ef2276d16f: Layer already exists
08a5b66845ac: Layer already exists
4b381ae03f9a: Layer already exists
88a85bcf8170: Layer already exists
65860ac81ef4: Layer already exists
6257fa9f9597: Layer already exists
a22a5ac18042: Layer already exists
578414b395b9: Layer already exists
13d5529fd232: Layer already exists
abc3250a6c7f: Layer already exists
dccb7ed6962a: Pushed
1.0: digest: sha256:775a7fe44fc3cafe4882f6a8b7e8a4e9d77816523bdf95ccfb856a6a4fbe3143 size: 3480

Run the docker container and collect benchmark stats

The docker container is now available to be pulled and run on any computing platform with Docker installed. Here I show how we can run the container locally and get the printed results with IPython/bash:

%%bash --out docker_results
docker run edwardchalstrey/classifier:1.0

import ast
docker_results = ast.literal_eval(docker_results)
print(docker_results)

{'Training time (s)': 0.13670337200164795, 'Prediction time (s)': 0.06618499755859375, 'Performance (micro avg f1 score)': 0.6974416017797553}

How do they compare?

In this example, the benchmark stats I have collected are the preformance stats measured by sci-kit learn, as well as the time taken to fit the classification model and the time it takes to predict the catagories of the test data. Let’s see how these differ between my local version and the container version:

from IPython.display import HTML, display
import tabulate

headers = ["Version"]
c_results = ["Local 1.0"]
d_results = ["Container 1.0"]
for k, v in local_results.items():
    headers.append(k)
    c_results.append(v)
for k, v in docker_results.items():
    d_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results], tablefmt='html')))

Version	Training time (s)	Prediction time (s)	Performance (micro avg f1 score)
Local 1.0	0.1223764419555664	0.0564265251159668	0.6974416017797553
Container 1.0	0.13670337200164795	0.06618499755859375	0.6974416017797553

Benchmarking a new version of the classifier

Consider that you’ve reached a stage of your project where you now wish to benchmark a new version of your software. For the purpose of this example, consider the inspired modification to our scikit-learn classifier to be reducing the size of the gamma paramater:

%%writefile classifier.py
from sklearn import datasets, svm, metrics
import time
import statistics as st

def create_model():

    return svm.SVC(gamma=0.001) # UPDATE

def benchmark_model(model, repeats=10):

    digits = datasets.load_digits()

    n_samples = len(digits.images)
    data = digits.images.reshape((n_samples, -1)) # MNIST images

    expected = digits.target[n_samples // 2:]

    output = {"Training time (s)": [], "Prediction time (s)": [],
    "Performance (micro avg f1 score)": []}

    tt, pt, p = [], [], []

    for i in range(0, repeats):

        # Train the classifier model
        start = time.time()
        model.fit(data[:n_samples // 2], digits.target[:n_samples // 2])
        end = time.time()
        tt.append(end - start)

        # Use the model for prediction
        start = time.time()
        predicted = model.predict(data[n_samples // 2:])
        end = time.time()
        pt.append(end - start)

        p.append(metrics.classification_report(expected, predicted, output_dict=True)['micro avg']['f1-score'])

    # Get median benchmarks for chosen number of repeats

    benchmarks = {
        "Training time (s)": st.median(tt),
        "Prediction time (s)": st.median(pt),
        "Performance (micro avg f1 score)": st.median(p)
    }

    return benchmarks

Overwriting classifier.py

New build

We can then build a new container based on our Docker image and benchmark script that tests the new classifier model, which we’ll tag as version 1.1

%%bash
docker build -t edwardchalstrey/classifier:1.1 .

Step 1/8 : FROM python:3
 ---> ac069ebfe1e1
Step 2/8 : RUN apt-get update
 ---> Using cache
 ---> 5a84d23aa7b5
Step 3/8 : RUN pip3 install numpy
 ---> Using cache
 ---> 4383ac463a3b
Step 4/8 : RUN pip3 install scipy
 ---> Using cache
 ---> 6fa2c9da864b
Step 5/8 : RUN pip3 install scikit-learn
 ---> Using cache
 ---> 0b888dbaed11
Step 6/8 : COPY classifier.py /classifier.py
 ---> 7fe07801e7e1
Step 7/8 : COPY benchmarks.py /benchmarks.py
 ---> 76ef128e6e31
Step 8/8 : CMD python3 benchmarks.py
 ---> Running in 0e629db6c3bf
Removing intermediate container 0e629db6c3bf
 ---> e0900d204d16
Successfully built e0900d204d16
Successfully tagged edwardchalstrey/classifier:1.1

Push

%%bash
docker push edwardchalstrey/classifier:1.1

The push refers to repository [docker.io/edwardchalstrey/classifier]
18661e28fe84: Preparing
17e84f583592: Preparing
f12d5b3c5c82: Preparing
337d3babfd9c: Preparing
493622b04a5f: Preparing
65ef2276d16f: Preparing
4b381ae03f9a: Preparing
08a5b66845ac: Preparing
88a85bcf8170: Preparing
4b381ae03f9a: Waiting
65860ac81ef4: Preparing
a22a5ac18042: Preparing
6257fa9f9597: Preparing
578414b395b9: Preparing
abc3250a6c7f: Preparing
13d5529fd232: Preparing
08a5b66845ac: Waiting
88a85bcf8170: Waiting
65860ac81ef4: Waiting
a22a5ac18042: Waiting
6257fa9f9597: Waiting
578414b395b9: Waiting
abc3250a6c7f: Waiting
13d5529fd232: Waiting
65ef2276d16f: Waiting
493622b04a5f: Layer already exists
f12d5b3c5c82: Layer already exists
337d3babfd9c: Layer already exists
18661e28fe84: Layer already exists
17e84f583592: Layer already exists
65ef2276d16f: Layer already exists
08a5b66845ac: Layer already exists
4b381ae03f9a: Layer already exists
65860ac81ef4: Layer already exists
88a85bcf8170: Layer already exists
a22a5ac18042: Layer already exists
6257fa9f9597: Layer already exists
578414b395b9: Layer already exists
13d5529fd232: Layer already exists
abc3250a6c7f: Layer already exists
1.1: digest: sha256:e060d94b67025d116b595492bc2fbfc7cd3164f7b4ccdc97c22993e64d58de9d size: 3480

Finally, lets add the benchmarks for the new classifier model into our table

%%bash --out docker_results_1
docker run edwardchalstrey/classifier:1.1

docker_results_1 = ast.literal_eval(docker_results_1)

d1_results = ["Container 1.1"]
for k, v in docker_results_1.items():
    d1_results.append(v)
display(HTML(tabulate.tabulate([headers, c_results, d_results, d1_results], tablefmt='html')))

Version	Training time (s)	Prediction time (s)	Performance (micro avg f1 score)
Local 1.0	0.1223764419555664	0.0564265251159668	0.6974416017797553
Container 1.0	0.13670337200164795	0.06618499755859375	0.6974416017797553
Container 1.1	0.04636788368225098	0.03962278366088867	0.9688542825361512