deepsensor.data.processor

deepsensor.data.processor#

class DataProcessor(folder=None, time_name='time', x1_name='x1', x2_name='x2', x1_map=None, x2_map=None, deepcopy=True, verbose=False)[source]#

Bases: object

Normalise xarray and pandas data for use in deepsensor models

Parameters:
  • folder (str, optional) – Folder to load normalisation params from. Defaults to None.

  • x1_name (str, optional) – Name of first spatial coord (e.g. “lat”). Defaults to “x1”.

  • x2_name (str, optional) – Name of second spatial coord (e.g. “lon”). Defaults to “x2”.

  • x1_map (tuple, optional) – 2-tuple of raw x1 coords to linearly map to (0, 1), respectively. Defaults to (0, 1) (i.e. no normalisation).

  • x2_map (tuple, optional) – 2-tuple of raw x2 coords to linearly map to (0, 1), respectively. Defaults to (0, 1) (i.e. no normalisation).

  • deepcopy (bool, optional) – Whether to make a deepcopy of raw data to ensure it is not changed by reference when normalising. Defaults to True.

  • verbose (bool, optional) – Whether to print verbose output. Defaults to False.

__call__(data, method='mean_std', assert_computed=False)[source]#

Normalise data.

Parameters:
  • data (xarray.DataArray | xarray.Dataset | pandas.DataFrame | List[xarray.DataArray | xarray.Dataset | pandas.DataFrame]) – Data to be normalised. Can be an xarray DataArray, xarray Dataset, pandas DataFrame, or a list containing objects of these types.

  • method (str, optional) – Normalisation method. Options include: - “mean_std”: Normalise to mean=0 and std=1 (default) - “min_max”: Normalise to min=-1 and max=1 - “positive_semidefinite”: Normalise to min=0 and std=1

Returns:

xarray.DataArray | xarray.Dataset | pandas.DataFrame | List[xarray.DataArray | xarray.Dataset | pandas.DataFrame] – Normalised data. Type or structure depends on the input.

add_to_config(var_ID, **kwargs)[source]#

Add kwargs to config dict for variable var_ID

check_params_computed(var_ID, method)[source]#

Check if normalisation params computed for a given variable.

Parameters:
  • var_ID – [Type] Description needed.

  • method – [Type] Description needed.

Returns:

bool – Whether normalisation params are computed for a given variable.

config_fname = 'data_processor_config.json'#
get_config(var_ID, data, method=None)[source]#

Get pre-computed normalisation params or compute them for variable var_ID.

Note

TODO do we need to pass var_ID? Can we just use the name of data?

Parameters:
  • var_ID – [Type] Description needed.

  • data – [Type] Description needed.

  • method (optional) – [Type] Description needed. Defaults to None.

Returns:

[Type] – Description of the returned value(s) needed.

classmethod load_dask(data)[source]#

Load dask data into memory.

Parameters:

data (xarray.DataArray | xarray.Dataset) – Description of the parameter.

Returns:

[Type and description of the returned value(s) needed.]

map(data, method=None, add_offset=True, unnorm=False, assert_computed=False)[source]#

Normalise or unnormalise the data values and coords in an xarray or pandas object.

Parameters:
Returns:

[Type] – [Description Needed]

map_array(data, var_ID, method=None, unnorm=False, add_offset=True)[source]#

Normalise or unnormalise the data values in an xarray, pandas, or numpy object.

Parameters:
Returns:

[Type] – [Description Needed]

map_coord_array(coord_array, unnorm=False)[source]#

Normalise or unnormalise a coordinate array.

Parameters:
  • coord_array (numpy.ndarray) – Array of shape (2, N) containing coords.

  • unnorm (bool, optional) – Whether to unnormalise. Defaults to False.

Returns:

[Type] – Description of the returned value(s) needed.

map_coords(data, unnorm=False)[source]#

Normalise spatial coords in a pandas or xarray object.

Parameters:
Returns:

[Type] – [Description Needed]

map_x1_and_x2(x1, x2, unnorm=False)[source]#

Normalise or unnormalise spatial coords in an array.

Parameters:
  • x1 (numpy.ndarray) – Array of shape (N_x1,) containing spatial coords of x1.

  • x2 (numpy.ndarray) – Array of shape (N_x2,) containing spatial coords of x2.

  • unnorm (bool, optional) – Whether to unnormalise. Defaults to False.

Returns:

Tuple[numpy.ndarray, numpy.ndarray] – Normalised or unnormalised spatial coords of x1 and x2.

save(folder)[source]#

Save DataProcessor config to JSON in folder

set_coord_params(time_name, x1_name, x1_map, x2_name, x2_map)[source]#

Set coordinate normalisation params.

Parameters:
  • time_name – [Type] Description needed.

  • x1_name – [Type] Description needed.

  • x1_map – [Type] Description needed.

  • x2_name – [Type] Description needed.

  • x2_map – [Type] Description needed.

Returns:

None.

unnormalise(data, add_offset=True)[source]#

Unnormalise data.

Parameters:
Returns:

xarray.DataArray | xarray.Dataset | pandas.DataFrame | List[xarray.DataArray | xarray.Dataset | pandas.DataFrame] – Unnormalised data.

da1_da2_same_grid(da1, da2)[source]#

Check if da1 and da2 are on the same grid.

Note

da1 and da2 are assumed normalised by DataProcessor.

Parameters:
Returns:

bool

Whether da1 and da2 are on the same grid.

interp_da1_to_da2(da1, da2)[source]#

Interpolate da1 to da2.

Note

da1 and da2 are assumed normalised by DataProcessor.

Parameters:
Returns:

xarray.DataArray

Interpolated xarray.

mask_coord_array_normalised(coord_arr, mask_da)[source]#

Remove points from (2, N) numpy array that are outside gridded xarray boolean mask.

If coord_arr is shape (2, N), then mask_da is a shape (N,) boolean array (True if point is inside mask, False if outside).

Parameters:
Returns:

numpy.ndarray

process_X_mask_for_X(X_mask, X)[source]#

Process X_mask by interpolating to X and converting to boolean.

Both X_mask and X are xarray DataArrays with the same spatial coords.

Parameters:
Returns:

xarray.DataArray

xarray_to_coord_array_normalised(da)[source]#

Convert xarray to normalised coordinate array.

Parameters:

da (xarray.Dataset | xarray.DataArray) –

Returns:

numpy.ndarray

A normalised coordinate array of shape (2, N).