1 clim_recal.pipeline

clim_recal.pipeline

Wrappers to automate the entire pipeline.

Following Andy’s very helpful excel file, this manages a reproduction of all steps of the debiasing pipeline.

2 Download Data

The download_ftp function from ceda_ftp_download.py can be used (with registered account user name and password), to download two datasets from the Centre for Environmental Data Analysis (CEDA)

  • Saved to ClimateData/Raw
  • HadUK-Grid
    • a 1km climate projection grid which is designed to supersede UKCP
    • For further details see Met Office
    • Saved to Raw/HadsUKgrid/
  • UKCP UK Climate Projections at 2.2 km
    • a 2.2km projection grid
    • Saved to Raw/UKCP2.2/

3 Reproject UKCP

The bash/reproject_one.sh copies and reprojects UKCP2.2via gdalwrap to a Raw/Reprojected_infill:

gdalwarp -t_srs 'EPSG:27700' -tr 2200 2200 -r near -overwrite $f "${fn%.nc}.tif" # Reproject the file`

New step: project UKCP to 360/365 days

Relevant xarray utilities:

4 Resample CPM

New approach:

  • resampling.py
  • check x_grid and y_grid interpolation

4.1 Todo

To run this step in the pipeline the following should work for the default combindations of variables: tasmax, tasmin, and rainfall and the default set of runs: 05, 06, 07 and 08, assuming the necessary data is mounted.

If installed via pipx/pip etc. on your local path (or within Docker) clim-recal should be a command line function

$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
                 cpm_folders_count=12, hads_folders_count=3, start_index=0,
                 stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>

Otherwise, you can install locally and either run via pdm from the python folder

$ cd clim-recal/python
$ pdm run clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
# Skipping output for brevity

Or within an ipython or jupyter instance (truncated below for brevity)

>>> from clim_recal.pipeline import main
>>> main(all_variables=True, default_runs=True)  # doctest: +SKIP
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, ...>

Regardless of your route, once you’re confident with the configuration, add the --execute parameter to run. For example, assuming a local install:

$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written --execute
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
                 cpm_folders_count=12, hads_folders_count=3, start_index=0,
                 stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Running CPM Standard Calendar projection...
<CPMResampler(count=100, max_count=100,
              input_path='/mnt/vmfileshare/ClimateData/Raw/UKCP2.2/tasmax/05/latest',
              output_path='/mnt/vmfileshare/ClimateData/CPM-365/test-run-3-may/resample/
              cpm/tasmax/05/latest')>
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100  [ 0:38:27 < 0:00:00 , 0 it/s ]
  87% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/100  [ 0:17:42 < 0:03:07 , 0 it/s ]

5 Cropping

6 Pre-processing

  • Originally used debiasing.pre_processing.py

New approach:

  • Refactor debiasing.debias-wrapper

7 Debiasing

  • python
    • Originally used debiasing.pre_processing.py
    • Refactor debiasing.debias-wrapper
  • R

8 Keep track

  • What is being superseded
  • What can be removed

8.1 Functions

Name Description
main Run all elements of the pipeline.

8.1.1 main

clim_recal.pipeline.main(execute=False, hads_input_path=RAW_HADS_PATH, cpm_input_path=RAW_CPM_PATH, output_path=DEFAULT_OUTPUT_PATH, variables=(VariableOptions.default()), regions=(RegionOptions.default()), runs=(RunOptions.default()), methods=(MethodOptions.default()), all_variables=False, all_regions=False, default_runs=False, all_runs=False, all_methods=False, hads_projection=False, cpm_projection=False, crop_hads=True, crop_cpm=True, cpus=None, multiprocess=False, start_index=0, stop_index=None, total=None, print_range_length=5, **kwargs)

Run all elements of the pipeline.

8.1.1.1 Parameters

Name Type Description Default
variables Sequence[VariableOptions | str] Variables to include in the model, eg. tasmax, tasmin. (VariableOptions.default())
runs Sequence[RunOptions | str] Which model runs to include, eg. “01”, “08”, “11”. (RunOptions.default())
regions Sequence[RegionOptions | str] | None Which regions to crop data to. Future plans facilitate skipping to run for entire UK. (RegionOptions.default())
methods Sequence[MethodOptions | str] Which debiasing methods to apply. (MethodOptions.default())
output_path PathLike Path to save intermediate and final results to. DEFAULT_OUTPUT_PATH
cpus int | None Number of cpus to use when multiprocessing. None
multiprocess bool Whether to use multiprocessing. False
start_index int Index to start all iterations from. 0
total int | None Total number of records to iterate over. 0 and None indicate all values from start_index. None
**kwargs Additional parameters to pass to a ClimRecalConfig. {}

8.1.1.2 Notes

The default parameters here are meant to reflect the entire workflow process to ease reproducibility.

8.1.1.3 Examples

Note the _allow_check_fail parameters support running the examples without data mounted from a server.

>>> main(variables=("rainfall", "tasmin"),
...      output_path=test_runs_output_path,
...      cpm_kwargs=dict(_allow_check_fail=True),
...      hads_kwargs=dict(_allow_check_fail=True),
... )
'set_cpm_for_coord_alignment' for 'HADs' not speficied.
Defaulting to 'self.cpm_input_path': '...'
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=2, runs_count=1,
                 regions_count=1, methods_count=1,
                 cpm_folders_count=2, hads_folders_count=2,
                 start_index=0, stop_index=None,
                 cpus=...)>
<CPMResamplerManager(variables_count=2, runs_count=1,
                     input_paths_count=2)>
<HADsResamplerManager(variables_count=2, input_paths_count=2)>
No steps run. Add '--execute' to run steps.
Back to top