1 Methods implemented in Python

WARNING: the documentation below predates a significant refactor.

1.1 Resampling HADs grid from 1 km to 2.2 km

The raw UKHAD observational data needs to be resampled to the same grid of the RCP8.5 data.

1.2 Running debiasing methods

The code in the debiasing directory contains scripts that interface with implementations of the debiasing methods implemented by different libraries.

Note: By March 2023 we have only implemented the python-cmethods library.

1.2.1 The cmethods library

This repository contains two python scripts one for preprocessing/grouping data and one to run debiasing in climate data using a fork of the original python-cmethods module written by Benjamin Thomas Schwertfeger’s , which has been modified to function with the dataset used in the clim-recal project. This library has been included as a submodule to this project, so you must run the following command to pull the submodules required.

$ cd debiasing
$ git submodule update --init --recursive
  • The preprocess_data.py script allows the user to specify directories from which the modelled (CPM/UKCP) data and observation (HADs) data should be loaded, as well as time periods to use for calibration and validation. The script parses the necessary files and combines them into two files for calibration (modelled and observed), and two files for validation (modelled and observed) - with the option to specify multiple validation periods. These can then be used as inputs to run_cmethods.py.
  • The run_cmethods.py script allow us to adjust climate biases in climate data using the python-cmethods library. It takes as input observation data (HADs data) and modelled data (historical CPM/UKCP data) for calibration, as well as observation and modelled data for validation (generated by preprocess_data.py). It calibrates the debiasing method using the calibration period data and debiases the modelled data for the validation period. The resulting output is saved as a .nc to a specified directory. The script will also produce a time-series and a map plot of the debiased data.

Usage:

The scripts can be run from the command line using the following arguments:

$ python3 preprocess_data.py --mod <path to modelled datasets> --obs <path to observation datasets> --shp <shapefile> --out <output file path> -v <variable> -u <unit> -r <CPM model run number> --calib_dates <date range for calibration> --valid_dates <date range for validation>

$ python3 run_cmethods.py --input_data_folder <input files directory> --out <output directory> -m <method> -v <variable> -g <group> -k <kind> -n <number of quantiles> -p <number of processes>

For more details on the scripts and options you can run:

$ python3 preprocess_data.py --help

and

python3 run_cmethods.py --help

Main Functionality:

The preprocess_data.py script performs the following steps:

  • Parses the input arguments.
  • Loads, merges and clips (if shapefile is provided) all calibration datasets and merges them into two distinct datasets: the m modelled and observed datasets.
  • Aligns the calendars of the two datasets, ensuring that they have the same time dimension.
  • Saves the calibration datasets to the output directory.
  • Loops over the validation time periods specified in the calib_dates variable and performs the following steps:
    • Loads the modelled data for the current time period.
    • Loads the observed data for the current time period.
    • Aligns and saves the datasets to the output directory.

The run_cmethods.py script performs the following steps: - Reads the input calibration and validation datasets from the input directory. - Applies the specified debiasing method, combining the calibration and valiation data. - Saves the resulting output to the specified directory. - Creates diagnotic figues of the output dataset (time series and time dependent maps) and saves it into the specified directory.

Working example.

Example of how to run the two scripts using data stored in the Azure fileshare, running the scripts locally (uses input data that have been cropped to contain only the city of Glasgow. The two scripts will debias only the tasmax variable, run 05 of the CPM, for calibration years 1980-2009 and validation years 2010-2019. It uses the quantile_delta_mapping debiasing method:

$ python3 preprocess_data.py --mod /Volumes/vmfileshare/ClimateData/Cropped/three.cities/CPM/Glasgow/ --obs /Volumes/vmfileshare/ClimateData/Cropped/three.cities/Hads.original360/Glasgow/ -v tasmax --out ./preprocessed_data/ --calib_dates 19800101-20091230 --valid_dates 20100101-20191230 --run_number 05

$ python3 run_cmethods.py --input_data_folder ./preprocessed_data/  --out ./debiased_data/  --method quantile_delta_mapping --v tasmax -p 4

1.3 Testing

Testing for python components uses pytest, with configuration specified in pyproject.toml. To run tests, ensure the conda-lock.yml environment is installed and activated, then run pytest from within the clim-recal/python checkout directory. Note: some tests are skipped unless run on a specific linux server wth data mounted to a specific path.

$ cd clim-recal
$ conda-lock install --name clim-recal conda-lock.yml
$ conda activate clim-recal
(clim-recal)$ cd python
(clim-recal)$ pytest
...sss........sss.....                                                         [100%]
============================== short test summary info ===============================
SKIPPED [1] <doctest test_debiasing.RunConfig.mod_path[0]>:2: requires linux server mount paths
SKIPPED [1] <doctest test_debiasing.RunConfig.obs_path[0]>:2: requires linux server mount paths
SKIPPED [1] <doctest test_debiasing.RunConfig.preprocess_out_path[0]>:2: requires linux server mount paths
SKIPPED [1] <doctest test_debiasing.RunConfig.yield_mod_folder[0]>:2: requires linux server mount paths
SKIPPED [1] <doctest test_debiasing.RunConfig.yield_obs_folder[0]>:2: requires linux server mount paths
SKIPPED [1] <doctest test_debiasing.RunConfig.yield_preprocess_out_folder[0]>:2: requires linux server mount paths
16 passed, 6 skipped, 4 deselected in 0.26s
Back to top