deepsensor.active_learning.algorithms

deepsensor.active_learning.algorithms#

class GreedyAlgorithm(model, X_s, X_t, X_s_mask=None, X_t_mask=None, N_new_context=1, X_normalised=False, model_infill_method='mean', query_infill=None, proposed_infill=None, context_set_idx=0, target_set_idx=0, progress_bar=False, task_loader=None, verbose=False)[source]#

Bases: object

Greedy active learning sensor placement algorithm.

Given a set of Task objects containing existing context data, the algorithm iteratively (i.e. ‘greedily’) proposes $N$ locations for new context points from a search grid, using active learning with a DeepSensorModel.

Within each greedy iteration, the algorithm evaluates an acquisition function over the search grid. The acquisition function value at a given query location relates to the merit of a new observation at that point, and is averaged over all Task objects. The algorithm then selects the context location with the ‘best’ (max or min) acquisition function value. A new context observation is added to each Task at that location. This process is repeated until $N$ new context locations have been proposed.

The algorithm either computes the acquisition function values in parallel over all query locations, or sequentially. This is dictated by the type of acquisition function passed to the algorithm:

1. AcquisitionFunction: Returns a scalar acquisition function for a given query location. For example, the model’s mean standard deviation over target locations (MeanStddev). For a given Task this requires running the model once for every query location with a new context point at that location, so these acquisition functions can be slow.

2. AcquisitionFunctionParallel: Returns all acquisition function values in parallel. For example, the model’s standard deviation at query locations given the existing context data, which only requires running the model once for a given Task. These acquisition functions are faster than their sequential counterparts but are likely less informative.

Acquisition functions that inherit from AcquisitionFunctionOracle require ground truth target values at target locations. In this case, the algorithm must be provided with a TaskLoader object to sample these values.

Note

The algorithm is described in more detail in ‘Environmental Sensor Placement with Convolutional Gaussian Neural Processes’ (2023), https://doi.org/10.1017/eds.2023.22.

Parameters:
  • model (DeepSensorModel) – Model to use for proposing new context points.

  • X_s (xarray.Dataset | xarray.DataArray) – Xarray object containing the spatial coordinates that define the search grid.

  • X_t (xarray.Dataset | xarray.DataArray | pd.DataFrame) – Target spatial coordinates. Can either be an xarray object containing the spatial coordinates of the target grid, or a pandas DataFrame containing a set of off-grid target locations.

  • X_s_mask (xarray.Dataset | xarray.DataArray, optional) – Optional 2D mask for gridded search coordinates to ignore. If provided, the acquisition function will only be computed at locations where the mask is True. Defaults to None.

  • X_t_mask (xarray.Dataset | xarray.DataArray, optional) – Optional 2D mask (for gridded target coordinates) to ignore. Useful e.g. if you only care about improving the model’s predictions over a certain area. Defaults to None.

  • N_new_context (int, optional) – Number of new context points to propose (i.e. number of greedy iterations), defaults to 1.

  • X_normalised (bool, optional) – Whether the coordinates of the X_* arguments above have been normalised by a DataProcessor. Defaults to False.

  • model_infill_method (str, optional) – Method for generating pseudo observations from the model at search points, which are appended to Tasks when computing acquisition functions or at the end of a greedy iteration (unless overridden by query_infill or proposed_infill below). Currently, only “mean” infilling is supported. Defaults to “mean”.

  • query_infill (xarray.DataArray, optional) – Gridded xarray object containing observations to use when querying candidate context points. Must have all the same time points as the Task objects the algorithm is called with. If not on the same grid as X_s, it will be linearly interpolated to the same grid. Useful for providing the model with true observations rather than its own predictions. Defaults to None.

  • proposed_infill (xarray.DataArray, optional) – Similar to query_infill, but used when infilling pseudo observations at the end of a greedy iteration (rather than using model predictions). Useful e.g. to simulate the case where the model can obtain ground truth after requesting a sensor placement. Defaults to None.

  • context_set_idx (int, optional) – Context set index to run the sensor placement algorithm on. E.g. if a model ingest two context sets [“aux_data”, “sensor_data”], this should be set to 1 (corresponding to the sensor context set). Defaults to 0.

  • target_set_idx (int, optional) – Target set index corresponding to predictions of the context set that the algorithm is run on. Defaults to 0.

  • progress_bar (bool, optional) – Whether to display a progress bar when running the algorithm. Defaults to False.

  • task_loader (TaskLoader, optional) – If using an AcquisitionFunctionOracle, a TaskLoader object is required to sample ground truth target values at target locations. Defaults to None.

  • verbose (bool, optional) – Whether to print some status messages. Defaults to False.

Raises:

ValueError – If the model passed does not inherit from DeepSensorModel.

__call__(acquisition_fn, tasks, diff=False)[source]#

Iteratively propose new context points using the greedy sensor placement algorithm.

Parameters:
  • acquisition_fn (AcquisitionFunction) – The acquisition function to optimise.

  • tasks (List[Task] | Task) – Tasks containing existing context data. If a list of Tasks, the acquisition function will be averaged over Tasks.

  • diff (bool, optional) – For sequential acquisition functions only: Whether to compute the change in acquisition function value after adding the new context point, i.e. acquisition_fn(task_with_new) - acquisition_fn(task). Can be useful for making the acquisition function values more interpretable, or for comparing with the change in metric that the acquisition function targets (see https://doi.org/10.1017/eds.2023.22). Defaults to False.

Returns:

Tuple[pandas.DataFrame, xarray.DataArray] – A tuple containing two objects:

Proposed sensor placements. Columns are the x1 and x2 coordinates of the sensor placements, and the index is the index of the greedy iteration at which the sensor placement was proposed (which can be interpreted as a priority order, with iteration 0 being the highest priority).

Gridded acquisition function values at each search point. Dimensions are iteration, time (inferred from the input tasks), followed by the x1 and x2 coordinates of the spatial grid.

Raises: