Benchmarking ReadabiliPy with containers
This directory contains a Dockerfile to build a benchmarking image for ReadabiliPy as per the guidelines specified by the Benchmarking with containers project, at the Alan Turing Institute.
Software: ReadabiliPy - A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla’s Readability.js package or in pure-python mode.
Benchmarks: Benchmark the speed of core package functions at extracting information from an input HTML with pytest. See test_benchmarking.py.
Running benchmarks
Using the pytest-benchmark package, we benchmark some of the package functions, including extraction of titles and dates from article HTML and the full article content in JSON format.
Benchmarks can be run from the top directory of the package with the following command: pytest --benchmark-only
.
Building a Docker image for Benchmarking ReadabiliPy
The Dockerfile specifies an image that installs the requirements for ReadabiliPy, clones the package from GitHub, then runs the benchmarks with pytest.
Docker Hub Automated build
An image was built with this Dockerfile and pushed to Docker Hub as edwardchalstrey/readabilipy_benchmark
. An automated build was set up so that the latest
tag is built whenever the master branch of the ReadabiliPy GitHub repo has a new commit.
Run the containerised benchmarks
The benchmark image can be pulled from the remote registry (Docker Hub), and run on any computing platform with Docker. Benchmarks can be run whenever new features are added.
Results
I have benchmarked three of the html parsing features of ReadabiliPy on an example html file; see the tests in ReadabiliPy repo within tests/test_benchmarking.py
.
Benchmarks run on these dates, are for the following ReadabiliPy commits and measure mean time ms:
Benchmarks on a Macbook:
Date | Date parse | Title parse | Full parse |
---|---|---|---|
2019-05-02 | 69.5056 | 55.5296 | 2140.0745 |
2019-05-14 | 44.4991 | 54.8936 | 1942.1609 |
2019-05-31 | 80.5528 | 94.9283 | 2290.3153 |
Benchmarks on a Macbook in Docker container:
Date | Date parse | Title parse | Full parse |
---|---|---|---|
2019-05-02 | 46.4389 | 40.2649 | 3065.2467 |
2019-05-14 | 32.8276 | 39.7405 | 2642.1735 |
2019-05-31 | 34.8774 | 41.2476 | 2838.9681 |