12 Data schema
12.1 Understanding the schema
Here are some helpful tips for understanding the schema.
Each .pb file contains exactly one Population
message. In contrast to datasets consisting of multiple .csv
files, just a single file contains everything. Some of the fields in Population
are lists (of people and households) or maps (of venues keyed by activity, or of MSOAs). Unlike a flat .csv
table, there may be more lists embedded later. Each Household
has a list of members
, for example.
The different objects refer to each other, forming a graph structure. The protobuf uses uint64
IDs to index into other lists. For example, if some household has members = [3, 10]
, then those two people can be found at population.people[3]
and population.people[10]
. Each of them will have the same household
ID, pointing back to something in the population.households
list.
12.2 Flows: modelling daily activities
SPC models daily travel behaviour of people as “flows.” Flows are broken down by an activity – shopping/retail, attending primary or secondary school, working, or staying at home. For each activity type, a person has a list of venues where they may do that activity, weighted by a probability of going to that particular venue.
Note that flows_per_activity
is stored in InfoPerMSOA
, not Person
. The flows for retail and school are only known at the MSOA level, not individually. So given a particular Person
object, you first look up their household’s MSOA – msoa = population.households[ person.household ].msoa
and then look up flows for that MSOA – population.info_per_msoa[msoa].flows_per_activity
.
Each person has exactly 1 flow for home – it’s just person.household
with probability 1. A person has 0 or 1 flows to work, based on the value of person.workplace
.
This doesn’t mean that all people in the same MSOA share the same travel behaviour. Each person has their own activity_durations
field, based on time-use survey data. Even if two people share the same set of places where they may go shopping, one person may spend much more time on that activity than another.
See the ASPICS conversion script for all of this in action – it has a function to collapse a person’s flows down into a single weighted list.
Note that per MSOA, very few venues are represented as destinations – 10 for retail and 5 for school. Only the most likely venues from QUANT are used.
12.3 Flow weights
How do you interpret the probabilities/weights for flows? If your model needs people to visit specific places each day, you could randomly sample a venue from the flows, weighting them appropriately. For retail, you may want to repeat this sampling every day of the simulation, so they visit different venues. For primary and secondary school, it may be more appropriate to sample once and store that for the simulation – a student probably doesn’t switch schools daily.
Alternatively, you can follow what ASPICS does. Every day, each person logically visits all possible venues, but their interaction there (possibly receiving or transmitting COVID) is weighted by the probability of each venue.