1 clim_recal.pipeline
clim_recal.pipeline
Wrappers to automate the entire pipeline.
Following Andy’s very helpful excel file, this manages a reproduction of all steps of the debiasing pipeline.
2 Download Data
The download_ftp function from ceda_ftp_download.py can be used (with registered account user name and password), to download two datasets from the Centre for Environmental Data Analysis (CEDA)
- Saved to
ClimateData/Raw HadUK-Grid- a 1km climate projection grid which is designed to supersede
UKCP - For further details see Met Office
- Saved to
Raw/HadsUKgrid/
- a 1km climate projection grid which is designed to supersede
UKCPUK Climate Projections at 2.2 km- a 2.2km projection grid
- Saved to
Raw/UKCP2.2/
3 Reproject UKCP
The bash/reproject_one.sh copies and reprojects UKCP2.2via gdalwrap to a Raw/Reprojected_infill:
gdalwarp -t_srs 'EPSG:27700' -tr 2200 2200 -r near -overwrite $f "${fn%.nc}.tif" # Reproject the file`New step: project UKCP to 360/365 days
Relevant xarray utilities:
4 Resample CPM
New approach:
resampling.py- check
x_gridandy_gridinterpolation
4.1 Todo
To run this step in the pipeline the following should work for the default combindations of variables: tasmax, tasmin, and rainfall and the default set of runs: 05, 06, 07 and 08, assuming the necessary data is mounted.
If installed via pipx/pip etc. on your local path (or within Docker) clim-recal should be a command line function
$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
cpm_folders_count=12, hads_folders_count=3, start_index=0,
stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Otherwise, you can install locally and either run via pdm from the python folder
$ cd clim-recal/python
$ pdm run clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
# Skipping output for brevity
Or within an ipython or jupyter instance (truncated below for brevity)
>>> from clim_recal.pipeline import main
>>> main(all_variables=True, default_runs=True) # doctest: +SKIP
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, ...>Regardless of your route, once you’re confident with the configuration, add the --execute parameter to run. For example, assuming a local install:
$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written --execute
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
cpm_folders_count=12, hads_folders_count=3, start_index=0,
stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Running CPM Standard Calendar projection...
<CPMResampler(count=100, max_count=100,
input_path='/mnt/vmfileshare/ClimateData/Raw/UKCP2.2/tasmax/05/latest',
output_path='/mnt/vmfileshare/ClimateData/CPM-365/test-run-3-may/resample/
cpm/tasmax/05/latest')>
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 [ 0:38:27 < 0:00:00 , 0 it/s ]
87% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/100 [ 0:17:42 < 0:03:07 , 0 it/s ]
5 Cropping
6 Pre-processing
- Originally used
debiasing.pre_processing.py
New approach:
- Refactor
debiasing.debias-wrapper
7 Debiasing
python- Originally used
debiasing.pre_processing.py - Refactor
debiasing.debias-wrapper
- Originally used
R
8 Keep track
- What is being superseded
- What can be removed
8.1 Functions
| Name | Description |
|---|---|
| main | Run all elements of the pipeline. |
8.1.1 main
clim_recal.pipeline.main(execute=False, hads_input_path=RAW_HADS_PATH, cpm_input_path=RAW_CPM_PATH, output_path=DEFAULT_OUTPUT_PATH, variables=(VariableOptions.default()), regions=(RegionOptions.default()), runs=(RunOptions.default()), methods=(MethodOptions.default()), all_variables=False, all_regions=False, default_runs=False, all_runs=False, all_methods=False, hads_projection=False, cpm_projection=False, crop_hads=True, crop_cpm=True, cpus=None, multiprocess=False, start_index=0, stop_index=None, total=None, print_range_length=5, **kwargs)
Run all elements of the pipeline.
8.1.1.1 Parameters
| Name | Type | Description | Default |
|---|---|---|---|
variables |
Sequence[VariableOptions | str] | Variables to include in the model, eg. tasmax, tasmin. |
(VariableOptions.default()) |
runs |
Sequence[RunOptions | str] | Which model runs to include, eg. “01”, “08”, “11”. | (RunOptions.default()) |
regions |
Sequence[RegionOptions | str] | None | Which regions to crop data to. Future plans facilitate skipping to run for entire UK. | (RegionOptions.default()) |
methods |
Sequence[MethodOptions | str] | Which debiasing methods to apply. | (MethodOptions.default()) |
output_path |
PathLike | Path to save intermediate and final results to. |
DEFAULT_OUTPUT_PATH |
cpus |
int | None | Number of cpus to use when multiprocessing. | None |
multiprocess |
bool | Whether to use multiprocessing. | False |
start_index |
int | Index to start all iterations from. | 0 |
total |
int | None | Total number of records to iterate over. 0 and None indicate all values from start_index. |
None |
**kwargs |
Additional parameters to pass to a ClimRecalConfig. |
{} |
8.1.1.2 Notes
The default parameters here are meant to reflect the entire workflow process to ease reproducibility.
8.1.1.3 Examples
Note the _allow_check_fail parameters support running the examples without data mounted from a server.
>>> main(variables=("rainfall", "tasmin"),
... output_path=test_runs_output_path,
... cpm_kwargs=dict(_allow_check_fail=True),
... hads_kwargs=dict(_allow_check_fail=True),
... )
'set_cpm_for_coord_alignment' for 'HADs' not speficied.
Defaulting to 'self.cpm_input_path': '...'
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=2, runs_count=1,
regions_count=1, methods_count=1,
cpm_folders_count=2, hads_folders_count=2,
start_index=0, stop_index=None,
cpus=...)>
<CPMResamplerManager(variables_count=2, runs_count=1,
input_paths_count=2)>
<HADsResamplerManager(variables_count=2, input_paths_count=2)>
No steps run. Add '--execute' to run steps.