deepsensor.data.processor#
- class DataProcessor(folder=None, time_name='time', x1_name='x1', x2_name='x2', x1_map=None, x2_map=None, deepcopy=True, verbose=False)[source]#
Bases:
object
Normalise xarray and pandas data for use in deepsensor models.
- Parameters:
folder (str, optional) – Folder to load normalisation params from. Defaults to None.
x1_name (str, optional) – Name of first spatial coord (e.g. “lat”). Defaults to “x1”.
x2_name (str, optional) – Name of second spatial coord (e.g. “lon”). Defaults to “x2”.
x1_map (tuple, optional) – 2-tuple of raw x1 coords to linearly map to (0, 1), respectively. Defaults to (0, 1) (i.e. no normalisation).
x2_map (tuple, optional) – 2-tuple of raw x2 coords to linearly map to (0, 1), respectively. Defaults to (0, 1) (i.e. no normalisation).
deepcopy (bool, optional) – Whether to make a deepcopy of raw data to ensure it is not changed by reference when normalising. Defaults to True.
verbose (bool, optional) – Whether to print verbose output. Defaults to False.
- __call__(data, method='mean_std', assert_computed=False)[source]#
Normalise data.
- Parameters:
data (
xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
| List[xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
]) – Data to be normalised. Can be an xarray DataArray, xarray Dataset, pandas DataFrame, or a list containing objects of these types.method (str, optional) – Normalisation method. Options include: - “mean_std”: Normalise to mean=0 and std=1 (default) - “min_max”: Normalise to min=-1 and max=1 - “positive_semidefinite”: Normalise to min=0 and std=1
- Returns:
xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
| List[xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
] – Normalised data. Type or structure depends on the input.
- check_params_computed(var_ID, method)[source]#
Check if normalisation params computed for a given variable.
- Parameters:
var_ID – [Type] Description needed.
method – [Type] Description needed.
- Returns:
bool – Whether normalisation params are computed for a given variable.
- config_fname = 'data_processor_config.json'#
- get_config(var_ID, data, method=None)[source]#
Get pre-computed normalisation params or compute them for variable
var_ID
.Note
TODO do we need to pass var_ID? Can we just use the name of data?
- Parameters:
var_ID – [Type] Description needed.
data – [Type] Description needed.
method (optional) – [Type] Description needed. Defaults to None.
- Returns:
[Type] – Description of the returned value(s) needed.
- classmethod load_dask(data)[source]#
Load dask data into memory.
- Parameters:
data (
xarray.DataArray
|xarray.Dataset
) – Description of the parameter.- Returns:
[Type and description of the returned value(s) needed.]
- map(data, method=None, add_offset=True, unnorm=False, assert_computed=False)[source]#
Normalise or unnormalise the data values and coords in an xarray or pandas object.
- Parameters:
data (
xarray.DataArray
,xarray.Dataset
,pandas.DataFrame
, orpandas.Series
) – [Description Needed]method (str, optional) – [Description Needed]. Defaults to None.
add_offset (bool, optional) – [Description Needed]. Defaults to True.
unnorm (bool, optional) – [Description Needed]. Defaults to False.
- Returns:
[Type] – [Description Needed]
- map_array(data, var_ID, method=None, unnorm=False, add_offset=True)[source]#
Normalise or unnormalise the data values in an xarray, pandas, or numpy object.
- Parameters:
data (
xarray.DataArray
,xarray.Dataset
,pandas.DataFrame
,pandas.Series
, ornumpy.ndarray
) – [Description Needed]var_ID (str) – [Description Needed]
method (str, optional) – [Description Needed]. Defaults to None.
unnorm (bool, optional) – [Description Needed]. Defaults to False.
add_offset (bool, optional) – [Description Needed]. Defaults to True.
- Returns:
[Type] – [Description Needed]
- map_coord_array(coord_array, unnorm=False)[source]#
Normalise or unnormalise a coordinate array.
- Parameters:
coord_array (
numpy.ndarray
) – Array of shape(2, N)
containing coords.unnorm (bool, optional) – Whether to unnormalise. Defaults to
False
.
- Returns:
[Type] – Description of the returned value(s) needed.
- map_coords(data, unnorm=False)[source]#
Normalise spatial coords in a pandas or xarray object.
- Parameters:
data (
xarray.DataArray
,xarray.Dataset
,pandas.DataFrame
, orpandas.Series
) – [Description Needed]unnorm (bool, optional) – [Description Needed]. Defaults to [Default Value].
- Returns:
[Type] – [Description Needed]
- map_x1_and_x2(x1, x2, unnorm=False)[source]#
Normalise or unnormalise spatial coords in an array.
- Parameters:
x1 (
numpy.ndarray
) – Array of shape(N_x1,)
containing spatial coords of x1.x2 (
numpy.ndarray
) – Array of shape(N_x2,)
containing spatial coords of x2.unnorm (bool, optional) – Whether to unnormalise. Defaults to
False
.
- Returns:
Tuple[
numpy.ndarray
,numpy.ndarray
] – Normalised or unnormalised spatial coords of x1 and x2.
- set_coord_params(time_name, x1_name, x1_map, x2_name, x2_map)[source]#
Set coordinate normalisation params.
- Parameters:
time_name – [Type] Description needed.
x1_name – [Type] Description needed.
x1_map – [Type] Description needed.
x2_name – [Type] Description needed.
x2_map – [Type] Description needed.
- Returns:
None.
- unnormalise(data, add_offset=True)[source]#
Unnormalise data.
- Parameters:
data (
xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
| List[xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
]) – Data to unnormalise.add_offset (bool, optional) – Whether to add the offset to the data when unnormalising. Set to False to unnormalise uncertainty values (e.g. std dev). Defaults to True.
- Returns:
xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
| List[xarray.DataArray
|xarray.Dataset
|pandas.DataFrame
] – Unnormalised data.
- da1_da2_same_grid(da1, da2)[source]#
Check if
da1
andda2
are on the same grid.Note
da1
andda2
are assumed normalised byDataProcessor
.- Parameters:
da1 (
xarray.DataArray
) –…
da2 (
xarray.DataArray
) –…
- Returns:
- bool
Whether
da1
andda2
are on the same grid.
- interp_da1_to_da2(da1, da2)[source]#
Interpolate
da1
toda2
.Note
da1
andda2
are assumed normalised byDataProcessor
.- Parameters:
da1 (
xarray.DataArray
) –…
da2 (
xarray.DataArray
) –…
- Returns:
xarray.DataArray
Interpolated xarray.
- mask_coord_array_normalised(coord_arr, mask_da)[source]#
Remove points from (2, N) numpy array that are outside gridded xarray boolean mask.
If coord_arr is shape (2, N), then mask_da is a shape (N,) boolean array (True if point is inside mask, False if outside).
- Parameters:
coord_arr (
numpy.ndarray
) –…
mask_da (
xarray.Dataset
|xarray.DataArray
) –…
- Returns:
- process_X_mask_for_X(X_mask, X)[source]#
Process X_mask by interpolating to X and converting to boolean.
Both X_mask and X are xarray DataArrays with the same spatial coords.
- Parameters:
X_mask (
xarray.DataArray
) –…
X (
xarray.DataArray
) –…
- Returns:
- xarray_to_coord_array_normalised(da)[source]#
Convert xarray to normalised coordinate array.
- Parameters:
da (
xarray.Dataset
|xarray.DataArray
) –…
- Returns:
numpy.ndarray
A normalised coordinate array of shape
(2, N)
.