1 clim_recal.pipeline
clim_recal.pipeline
Wrappers to automate the entire pipeline.
Following Andy’s very helpful excel
file, this manages a reproduction of all steps of the debiasing pipeline.
2 Download Data
The download_ftp
function from ceda_ftp_download.py
can be used (with registered account user name and password), to download two datasets from the Centre for Environmental Data Analysis (CEDA)
- Saved to
ClimateData/Raw
HadUK-Grid
- a 1km climate projection grid which is designed to supersede
UKCP
- For further details see Met Office
- Saved to
Raw/HadsUKgrid/
- a 1km climate projection grid which is designed to supersede
UKCP
UK Climate Projections at 2.2 km- a 2.2km projection grid
- Saved to
Raw/UKCP2.2/
3 Reproject UKCP
The bash/reproject_one.sh
copies and reprojects UKCP2.2
via gdalwrap
to a Raw/Reprojected_infill
:
gdalwarp -t_srs 'EPSG:27700' -tr 2200 2200 -r near -overwrite $f "${fn%.nc}.tif" # Reproject the file`
New step: project UKCP to 360/365 days
Relevant xarray
utilities:
4 Resample CPM
New approach:
resampling.py
- check
x_grid
andy_grid
interpolation
4.1 Todo
To run this step in the pipeline the following should work for the default combindations of variables
: tasmax
, tasmin
, and rainfall
and the default set of runs: 05
, 06
, 07
and 08
, assuming the necessary data is mounted.
If installed via pipx
/pip
etc. on your local path (or within Docker
) clim-recal should be a command line function
$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
cpm_folders_count=12, hads_folders_count=3, start_index=0,
stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Otherwise, you can install locally and either run via pdm
from the python
folder
$ cd clim-recal/python
$ pdm run clim-recal --all-variables --default-runs --output-path /where/results/should/be/written
# Skipping output for brevity
Or within an ipython
or jupyter
instance (truncated below for brevity)
>>> from clim_recal.pipeline import main
>>> main(all_variables=True, default_runs=True) # doctest: +SKIP
-recal pipeline configurations:
clim<ClimRecalConfig(variables_count=3, runs_count=4, ...>
Regardless of your route, once you’re confident with the configuration, add the --execute
parameter to run. For example, assuming a local install:
$ clim-recal --all-variables --default-runs --output-path /where/results/should/be/written --execute
clim-recal pipeline configurations:
<ClimRecalConfig(variables_count=3, runs_count=4, regions_count=1, methods_count=1,
cpm_folders_count=12, hads_folders_count=3, start_index=0,
stop_index=None, cpus=2)>
<CPMResamplerManager(variables_count=3, runs_count=4, input_paths_count=12)>
<HADsResamplerManager(variables_count=3, input_paths_count=3)>
Running CPM Standard Calendar projection...
<CPMResampler(count=100, max_count=100,
input_path='/mnt/vmfileshare/ClimateData/Raw/UKCP2.2/tasmax/05/latest',
output_path='/mnt/vmfileshare/ClimateData/CPM-365/test-run-3-may/resample/
cpm/tasmax/05/latest')>
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100/100 [ 0:38:27 < 0:00:00 , 0 it/s ]
87% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/100 [ 0:17:42 < 0:03:07 , 0 it/s ]
5 Cropping
6 Pre-processing
- Originally used
debiasing.pre_processing.py
New approach:
- Refactor
debiasing.debias-wrapper
7 Debiasing
python
- Originally used
debiasing.pre_processing.py
- Refactor
debiasing.debias-wrapper
- Originally used
R
8 Keep track
- What is being superseded
- What can be removed
8.1 Functions
Name | Description |
---|---|
main | Run all elements of the pipeline. |
8.1.1 main
clim_recal.pipeline.main(execute=False, hads_input_path=RAW_HADS_PATH, cpm_input_path=RAW_CPM_PATH, output_path=DEFAULT_OUTPUT_PATH, variables=(VariableOptions.default()), regions=(RegionOptions.default()), runs=(RunOptions.default()), methods=(MethodOptions.default()), all_variables=False, all_regions=False, default_runs=False, all_runs=False, all_methods=False, hads_projection=False, cpm_projection=False, crop_hads=True, crop_cpm=True, cpus=None, multiprocess=False, start_index=0, stop_index=None, total=None, print_range_length=5, **kwargs)
Run all elements of the pipeline.
8.1.1.1 Parameters
Name | Type | Description | Default |
---|---|---|---|
variables |
Sequence[VariableOptions | str] | Variables to include in the model, eg. tasmax , tasmin . |
(VariableOptions.default()) |
runs |
Sequence[RunOptions | str] | Which model runs to include, eg. “01”, “08”, “11”. | (RunOptions.default()) |
regions |
Sequence[RegionOptions | str] | None | Which regions to crop data to. Future plans facilitate skipping to run for entire UK. | (RegionOptions.default()) |
methods |
Sequence[MethodOptions | str] | Which debiasing methods to apply. | (MethodOptions.default()) |
output_path |
PathLike | Path to save intermediate and final results to. |
DEFAULT_OUTPUT_PATH |
cpus |
int | None | Number of cpus to use when multiprocessing. | None |
multiprocess |
bool | Whether to use multiprocessing. | False |
start_index |
int | Index to start all iterations from. | 0 |
total |
int | None | Total number of records to iterate over. 0 and None indicate all values from start_index . |
None |
**kwargs |
Additional parameters to pass to a ClimRecalConfig . |
{} |
8.1.1.2 Notes
The default parameters here are meant to reflect the entire workflow process to ease reproducibility.
8.1.1.3 Examples
Note the _allow_check_fail
parameters support running the examples without data mounted from a server.
>>> main(variables=("rainfall", "tasmin"),
=test_runs_output_path,
... output_path=dict(_allow_check_fail=True),
... cpm_kwargs=dict(_allow_check_fail=True),
... hads_kwargs
... )'set_cpm_for_coord_alignment' for 'HADs' not speficied.
'self.cpm_input_path': '...'
Defaulting to -recal pipeline configurations:
clim<ClimRecalConfig(variables_count=2, runs_count=1,
=1, methods_count=1,
regions_count=2, hads_folders_count=2,
cpm_folders_count=0, stop_index=None,
start_index=...)>
cpus<CPMResamplerManager(variables_count=2, runs_count=1,
=2)>
input_paths_count<HADsResamplerManager(variables_count=2, input_paths_count=2)>
'--execute' to run steps. No steps run. Add