1 clim_recal.pipeline

clim_recal.pipeline

Wrappers to automate the entire pipeline.

Following Andy’s very helpful excel file, this manages a reproduction of all steps of the debiasing pipeline.

2 Download Data

The download_ftp function from ceda_ftp_download.py can be used (with registered account user name and password), to download two datasets from the Centre for Environmental Data Analysis (CEDA)

Saved to ClimateData/Raw
HadUK-Grid
- a 1km climate projection grid which is designed to supersede UKCP
- For further details see Met Office
- Saved to Raw/HadsUKgrid/
UKCP UK Climate Projections at 2.2 km
- a 2.2km projection grid
- Saved to Raw/UKCP2.2/

3 Reproject UKCP

The bash/reproject_one.sh copies and reprojects UKCP2.2via gdalwrap to a Raw/Reprojected_infill:

gdalwarp -t_srs 'EPSG:27700' -tr 2200 2200 -r near -overwrite $f "${fn%.nc}.tif" # Reproject the file`

New step: project UKCP to 360/365 days

Relevant xarray utilities:

4 Resample CPM

New approach:

resampling.py
check x_grid and y_grid interpolation

4.1 Todo

To run this step in the pipeline the following should work for the default combindations of variables: tasmax, tasmin, and rainfall and the default set of runs: 05, 06, 07 and 08, assuming the necessary data is mounted.

If installed via pipx/pip etc. on your local path (or within Docker) clim-recal should be a command line function

$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
                 cpm_folders_count=12, hads_folders_count=3, start_index=0,
                 stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>

Otherwise, you can install locally and either run via pdm from the python folder

$ cd clim-recal/python
$ pdm run clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
# Skipping output for brevity

Or within an ipython or jupyter instance (truncated below for brevity)

>>> from clim_recal.pipeline import main
>>> main(all_variables=True, default_runs=True)  # doctest: +SKIP
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, ...>

Regardless of your route, once you’re confident with the configuration, add the --execute parameter to run. For example, assuming a local install:

$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written --execute
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
                 cpm_folders_count=12, hads_folders_count=3, start_index=0,
                 stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Running CPM Standard Calendar projection...
<CPMResampler(count=100, max_count=100,
              input_path='/mnt/vmfileshare/ClimateData/Raw/UKCP2.2/tasmax/05/latest',
              output_path='/mnt/vmfileshare/ClimateData/CPM-365/test-run-3-may/resample/
              cpm/tasmax/05/latest')>
 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100  [ 0:38:27 < 0:00:00 , 0 it/s ]
  87% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/100  [ 0:17:42 < 0:03:07 , 0 it/s ]

5 Cropping

6 Pre-processing

Originally used debiasing.pre_processing.py

New approach:

Refactor debiasing.debias-wrapper

7 Debiasing

python
- Originally used debiasing.pre_processing.py
- Refactor debiasing.debias-wrapper
R

8 Keep track

What is being superseded
What can be removed

8.1 Functions

Name	Description
main	Run all elements of the pipeline.

8.1.1 main

clim_recal.pipeline.main(execute=False, hads_input_path=RAW_HADS_PATH, cpm_input_path=RAW_CPM_PATH, output_path=DEFAULT_OUTPUT_PATH, variables=(VariableOptions.default()), regions=(RegionOptions.default()), runs=(RunOptions.default()), methods=(MethodOptions.default()), all_variables=False, all_regions=False, default_runs=False, all_runs=False, all_methods=False, hads_projection=False, cpm_projection=False, crop_hads=True, crop_cpm=True, cpus=None, multiprocess=False, start_index=0, stop_index=None, total=None, print_range_length=5, **kwargs)

Run all elements of the pipeline.

8.1.1.1 Parameters

Name	Type	Description	Default
`variables`	Sequence[VariableOptions \| str]	Variables to include in the model, eg. `tasmax`, `tasmin`.	`(VariableOptions.default())`
`runs`	Sequence[RunOptions \| str]	Which model runs to include, eg. “01”, “08”, “11”.	`(RunOptions.default())`
`regions`	Sequence[RegionOptions \| str] \| None	Which regions to crop data to. Future plans facilitate skipping to run for entire UK.	`(RegionOptions.default())`
`methods`	Sequence[MethodOptions \| str]	Which debiasing methods to apply.	`(MethodOptions.default())`
`output_path`	PathLike	`Path` to save intermediate and final results to.	`DEFAULT_OUTPUT_PATH`
`cpus`	int \| None	Number of cpus to use when multiprocessing.	`None`
`multiprocess`	bool	Whether to use multiprocessing.	`False`
`start_index`	int	Index to start all iterations from.	`0`
`total`	int \| None	Total number of records to iterate over. 0 and `None` indicate all values from `start_index`.	`None`
`**kwargs`		Additional parameters to pass to a `ClimRecalConfig`.	`{}`

8.1.1.2 Notes

The default parameters here are meant to reflect the entire workflow process to ease reproducibility.

8.1.1.3 Examples

Note the _allow_check_fail parameters support running the examples without data mounted from a server.

>>> main(variables=("rainfall", "tasmin"),
...      output_path=test_runs_output_path,
...      cpm_kwargs=dict(_allow_check_fail=True),
...      hads_kwargs=dict(_allow_check_fail=True),
... )
'set_cpm_for_coord_alignment' for 'HADs' not speficied.
Defaulting to 'self.cpm_input_path': '...'
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=2, runs_count=1,
                 regions_count=1, methods_count=1,
                 cpm_folders_count=2, hads_folders_count=2,
                 start_index=0, stop_index=None,
                 cpus=...)>
<CPMResamplerManager(variables_count=2, runs_count=1,
                     input_paths_count=2)>
<HADsResamplerManager(variables_count=2, input_paths_count=2)>
No steps run. Add '--execute' to run steps.