13  Data sources

The original data are provided at different scales, which define their level of accuracy. For simplicity, the outputs of SPC are geolocated at Output Area (OA) level, although this scale may not be relevant to all indicators. The 2011 OAs are a geographical unit created for census collection and are designed to be relatively homogeneous, with an average size between 120 and 129 households.

The data from Open Street Map (OSM) is downloaded directly from https://www.openstreetmap.org. Everything else is hosted through local copies inside one Azure repository that interacts automatically with the model. We describe below the content of this repository and indicate the raw source used for each indicator. It is divided into utilities, county level data and national data. To recreate the content of this repository from raw sources, please refer to this part of the code.

13.1 Utility data

lookUp-GB.csv.gz

The look-up table links different geographies of Great Britain together. It is used internally by the model, but can also help the user define their own study area. The following are standard denominations, compatible with ONS fields of the same name. They are based on ONS lookups. See ONS documentation for more details.

  • OA11CD: Output area codes for the 2011 census (120 to 129 households)
  • LSOA11CD & LSOA11NM: Lower-layer Super Output Areas (about 2000 individuals), replaced by Intermediary Zones for Scotland
  • MSOA11CD,MSOA11NM: Middle-layer Super Output Areas (about 8000 individuals), replaced by Data Zones for Scotland
  • LAD20CD, LAD20NM: Local Authority Districts (314 for England, 22 for Wales and 32 for Scotland)
  • ITL321CD, ITL321NM, ITL221CD, ITL221NM, ITL121CD & ITL121NM: International Territorial Level, replacing pre-Brexit NUTS European divisions.
  • RGN20CD & RGN20NM: Regions of England (NA for other Wales and Scotland)
  • Country: England, Wales or Scotland

In addition,

  • AzureRef: Name of the geographical unit for the County level data folder inside Azure (Lieutenancy Areas – a.k.a. Ceremonial Counties – for England, Scottish Police Divisions and ITL321NM for Wales) For Wales: ITL321NM
  • GoogleMob & OSM are alternate spellings used by Google and OSM for their data releases.

13.2 County level data

Files in this section are grouped by country (England, Wales and Scotland), then date (2012, 2020, 2022, 2032, 2039). The format of a path to an individual file is:

https://ramp0storage.blob.core.windows.net/countydata-v2-1/[country]/[date]/pop_[area_name].csv.gz

where [country], [date] and [area_name] must be replaced accordingly. As of July 2023, England contains 5 series of 47 files, Wales 5 series of 12 files and Scotland 5 series of 13 files

pop_[area_name].csv.gz

The data is mainly based on the 2011 UK census, the UK Time Use Survey 2014-15 and the health surveys of GB (England, Wales, Scotland). The SPENSER microsimulation model is used to distribute and project individuals from the census with MSOA scale constraints into synthetic households with OA constraints. These data are enriched with some of the content of the other datasets mentioned (the rest of which can be added a posteriori from the identifiers provided). The data have also been complemented with a modelling of BMI and salaries.

The fields currently contained are detailed in this .txt document. They are:

  • pid: Unique person identifier at GB level within SPC
  • hid: Unique household identifier at GB level within SPC
  • OA11CD: Output Area code of the individual’s home (ONS, 2011 boundaries)
  • sex: Sex assigned at birth (DC1117EW, census 2011)
  • age: Age in years (DC1117EW, census 2011)
  • ethnicity: Based on self-report (aggregated from DC2101EW, census 2011)
  • nssec8: National Statistics Socio-economic classification (see methods)
  • HOUSE_nssec8: National Statistics Socio-economic classification of the reference person of the household (LC4605, census 2011)
  • House_type: Type of accommodation (based on LC4402EW, census 2011)
  • HOUSE_typeCommunal: Type of communal establishment (based on QS420, census 2011)
  • HOUSE_NRooms: Number of rooms in the accommodation (LC4404EW, census 2011)
  • HOUSE_centralHeat: Presence of central heating (based on LC4402EW, census 2011)
  • HOUSE_tenure: Tenure (based on LC4402EW, census 2011)
  • HOUSE_NCars: Number of cars (derived from LC4202EW by SPENSER team, census 2011)
  • id_HS: unique identifier within the Health Survey (aggregated from the Health surveys from England, Wales and Scotland)
  • HEALTH_diabetes: for Scotland and England, has doctor diagnosed diabetes; for Wales, diabetes currently treated (derived from HSE, HSW, SHS)
  • HEALTH_bloodpressure: for Scotland and England, Doctor diagnosed high blood pressure; for Wales, high blood pressure currently treated (derived from HSE, HSW, SHS)
  • HEALTH_cvd: for England, cardiovascular medication taken in the last 7 days; for Scotland, had cardiovascular condition excluding diabetes / blood pressure; for Wales, any heart condition excluding high blood pressure (derived from HSE, HSW, SHS)
  • HEALTH_NMedicines: Number of prescribed medications (derived from HSE, HSW, SHS)
  • HEALTH_selfAssessed: Self assessed general health (derived from HSE, HSW, SHS)
  • HEALTH_lifeSat: how satisfied with life nowadays? (derived from HSE, HSW, SHS)
  • HEALTH_bmi: BMI (see methods)
  • id_TUS_hh: serial household identifier field in the UK Time Use Survey 2015
  • id_TUS_p: pnum person identifier field in the UK Time Use Survey 2015
  • pwkstat: Employment status (derived from UK TUS 2015)
  • soc2010: Standard Occupational Classification (derived from UK TUS 2015)
  • sic1d2007: Standard Industry Classification of economic activities 2007, 1st level (derived from UK TUS 2015)
  • sic2d2007: Standard Industry Classification of economic activities 2007, 2nd level (derived from UK TUS 2015)
  • netPayWeekly: Weekly take home pay after all deductions (derived from UK TUS 2015)
  • workedHoursWeekly: Number of hours per week usually worked in main job or business (derived from UK TUS 2015)
  • incomeH: Hourly gross salary for full-time and part-time employees (see methods)
  • incomeY: Yearly gross salary for full-time and part-time employees (see methods)
  • incomeHAsIf: Hourly gross salary for employees with self employed/other employees as employees of the same industry and with mean hourly worked for the industry when the number of hours is missing (see methods)
  • incomeYAsIf: Yearly gross salary for employees with self employed/other employees as employees of the same industry and with mean hourly worked for the industry when the number of hours is missing (see methods)
  • ESport: Relative probability weight to attend a sport fixture (Experimental, WIP)
  • ERugby: Relative probability weight to attend a Rugby fixture (Experimental, WIP)
  • EConcertM: Relative probability weight to attend a concert primarily targeting young males (Experimental, WIP)
  • EConcertF: Relative probability weight to attend a concert primarily targeting young females (Experimental, WIP)
  • EConcertMS: Relative probability weight to attend a concert primarily targeting middle-aged males (Experimental, WIP)
  • EConcertMS: Relative probability weight to attend a concert primarily targeting middle-aged females (Experimental, WIP)
  • EMuseum: Relative probability weight to visit a museum (Experimental, WIP)
  • easting: X coordinate of the OA centroid in the British National Grid coordinate system (epsg:27700, source: ONS)
  • northing: Y coordinate of the OA centroid in the British National Grid coordinate system (epsg:27700, source: ONS)
  • lng: X coordinate of the OA centroid in the Longitude/Latitude coordinate system (epsg:4326, derived from ONS)
  • lat: Y coordinate of the OA centroid in the Longitude/Latitude coordinate system (epsg:4326, derived from ONS)

13.3 National data

businessRegistry.csv.gz

Contains a breakdown of all business units (i.e. a single workplace) in Great Britain at LSOA scale, estimated by the project contributors from two nomis datasets: UK Business Counts - local units by industry and employment size band 2020 and Business Register and Employment Survey 2015. Each item contains the size of the unit and its main sic1d07 code in reference to standard Industrial Classification of Economic Activities 2007 (number corresponding to the letter in alphabetical order). It is used to compute commuting flows.

GIS/

This directory contains three GIS datasets of GB in GeoJson format taken from ONS boundaries:

QUANT_RAMP_spc.tar.gz

See: Milton R, Batty M, Dennett A, dedicated RAMP Spatial Interaction Model GitHub repository. It is used to compute the flows towards schools and retail.

timeAtHomeIncreaseCTY.csv.gz

This file is a subset from Google COVID-19 Community Mobility Reports, cropped to GB. It describes the daily reduction in mobility, averaged at county level, due to lockdown and other COVID-19 restrictions between the 15th of February 2020 and 15th of October 2022. Missing values have been replaced by the national average. These values can be used directly to reduce pnothome and increase phometot (and their sub-categories) to simulate more accurately the period.

diariesRef.csv.gz

Contains diaries taken from the UK TUS that can be distributed to the population on a daily basis. They contain weekend days and weekday days. A full description of the fields can be found here.