6 How to use the output file
Once you download or generate an SPC output file for your study area, how do you use it? Each study area consists of one .pb
or protocol buffer file. This file efficiently encodes data following this schema. Read more about what data is contained in the output.
You can read the “protobuf” (shorthand for a protocol buffer file) in any supported language, and then extract and transform just the parts of the data you want for your model.
We have examples for Python below, but feel free to request other languages.
6.1 Javascript
We have a web app using Svelte to interactively explore SPC data. Its source code is great reference for how to use the proto output.
6.2 Python
To work with SPC outputs in Python, we recommend using the SPC toolkit.
6.2.1 Install
The package can be installed with pip
from git with:
pip install 'git+https://github.com/alan-turing-institute/uatk-spc.git#subdirectory=python'
or with Poetry:
poetry add 'git+https://github.com/alan-turing-institute/uatk-spc.git#subdirectory=python'
To include extra dependencies dev
and examples
for running tests and example notebooks:
pip install 'uatk-spc[dev,examples] @ git+https://github.com/alan-turing-institute/uatk-spc.git#subdirectory=python'
or with Poetry:
poetry add 'git+https://github.com/alan-turing-institute/uatk-spc.git#subdirectory=python'
6.2.2 Reading and Building
The two classes provided by the toolkit are a Reader
(reads all population fields: people
, households
, venues_per_activity
, time_use_diaries
and info_per_msoa
) and a Builder
(extends the Reader
to include methods for combining the fields on top a people dataframe to build a single final dataframe).
An example of using the Reader
is shown below:
# Import package
from uatk_spc import Reader
# Pick a region with SPC output saved
= "rutland", "data/output/England/2020/"
(region, path) # Read from parquet and JSON
= Reader(path, region, backend="polars")
population # Or directly from a filepath to a gzip archive
= Reader(
population ="https://ramp0storage.blob.core.windows.net/test-spc-output/test_region.tar.gz",
filepath="polars"
backend
)# Write people to csv
"people.csv", index=False) population.people.to_pandas().to_csv(
And an example of using the Builder
:
# Import Builder
from uatk_spc import Builder
= "rutland", "../data/output/England/2020/"
(region, path) # Build population from people and households and unnest "health" and "details"
= (
population ="polars")
Builder(path, region, backend
.add_households()"health", "details"])
.unnest([
.build()
)# Write combined people and household dataframe to csv
"people_and_households.csv", index=False) population.to_pandas().to_csv(
For further examples of using the Reader
and Builder
to analyse SPC outputs, see the examples folder.
6.2.3 Converting .pb file to JSON format
To interactively explore the data, viewing JSON is much easier. It shows the same structure as the protobuf, but in a human-readable text format. The example below uses a small Python script:
# Download a file
wget https://ramp0storage.blob.core.windows.net/spc-output/v2.1/England/2020/rutland.pb.gz
# Uncompress
gunzip rutland.pb.gz
# Convert the .pb to JSON
# - without poetry
python python/uatk_spc/scripts.py --input-path data/output/England/2020/rutland.pb > rutland.json
# - with poetry
cd python && poetry run spc_to_json --input-path ../data/output/England/2020/rutland.pb > rutland.json
# View the output
less rutland.json
6.2.4 Converting to numpy arrays
The ASPICS project simulates the spread of COVID through a population. The code uses numpy, and this script converts the protobuf to a bunch of different numpy arrays.
Note the ASPICS code doesn’t keep using the generated Python protobuf classes for the rest of the pipeline. Data frames and numpy arrays may be more familiar and appropriate. The protobuf is a format optimized for reading and writing; you don’t need to use it throughout all of your model code.
6.2.5 Visualizing venues
Use this script to read a protobuf file, then draws a dot for every venue, color-coded by activity.