6 How to use the output file
Once you download or generate an SPC output file for your study area, how do you use it? Each study area consists of one .pb
or protocol buffer file. This file efficiently encodes data following this schema. Read more about what data is contained in the output.
You can read the “protobuf” (shorthand for a protocol buffer file) in any supported language, and then extract and transform just the parts of the data you want for your model.
We have examples for Python below, but feel free to request other languages.
6.1 Javascript
We have a web app using Svelte to interactively explore SPC data. Its source code is great reference for how to use the proto output.
6.2 Python
To work with SPC protobufs in Python, you need two dependencies setup:
- The protobuf library
- You can install system-wide with
pip install protobuf
- Or add as a dependency to a conda, poetry, etc environment
- You can install system-wide with
- The generated Python library, synthpop_pb2.py
- You can download a copy of this file into your codebase, then
import synthpop_pb2
- You can also generate the file yourself, following the docs:
protoc --python_out=python/ synthpop.proto
- You can download a copy of this file into your codebase, then
6.2.1 Converting to Pandas data-frames and CSV
The schema expresses relationships between people, households, and venues that can’t all be captured by a simple 2D table. Nevertheless, you can extract per-person information and express as a dataframe or CSV file. See this example Python script for inspiration. You can try it out:
# Download a file
wget https://ramp0storage.blob.core.windows.net/spc-output/v1/rutland.pb.gz
# Uncompress
gunzip rutland.pb.gz
# Convert the .pb to JSON
python3 python/protobuf_to_csv.py --input_path data/output/rutland.pb
# View the output
less people.csv
6.2.2 Converting .pb file to JSON format
To interactively explore the data, viewing JSON is much easier. It shows the same structure as the protobuf, but in a human-readable text format. The example below uses a small Python script:
# Download a file
wget https://ramp0storage.blob.core.windows.net/spc-output/v1/rutland.pb.gz
# Uncompress
gunzip rutland.pb.gz
# Convert the .pb to JSON
python3 python/protobuf_to_json.py data/output/rutland.pb > rutland.json
# View the output
less rutland.json
6.2.3 Converting to numpy arrays
The ASPICS project simulates the spread of COVID through a population. The code uses numpy, and this script converts the protobuf to a bunch of different numpy arrays.
Note the ASPICS code doesn’t keep using the generated Python protobuf classes for the rest of the pipeline. Data frames and numpy arrays may be more familiar and appropriate. The protobuf is a format optimized for reading and writing; you don’t need to use it throughout all of your model code.
6.2.4 Visualizing venues
Use this script to read a protobuf file, then draws a dot for every venue, color-coded by activity.