autoemulate.core.model_selection

autoemulate.core.model_selection#

evaluate(y_pred, y_true, metric=MetricConfig(name=r2, maximize=True))[source]#

Evaluate Emulator prediction performance using a torchmetrics.Metric.

Parameters:
  • y_true (TensorLike) – Ground truth target values.

  • y_pred (TensorLike) – Predicted target values, as returned by an Emulator.

  • metric (Metric) – Metric to use for evaluation. Defaults to R2.

Return type:

float

cross_validate(cv, dataset, model, model_params, transformed_emulator_params=None, x_transforms=None, y_transforms=None, device='cpu', random_seed=None, metrics=None)[source]#

Cross validate model performance using the given cv strategy.

Parameters:
  • cv (BaseCrossValidator) – Provides split method that returns train/val Dataset indices using the specified cross-validation strategy (e.g., KFold, LeaveOneOut).

  • dataset (Dataset) – The data to use for model training and validation.

  • model (Emulator) – An instance of an Emulator subclass.

  • model_params (ModelParams) – Model parameters to be used to construct model upon initialization. Passing an empty dictionary {} will use default parameters.

  • transformed_emulator_params (None | TransformedEmulatorParams) – Parameters for the transformed emulator. Defaults to None.

  • device (DeviceLike) – The device to use for model training and evaluation.

  • random_seed (int | None) – Optional random seed for reproducibility.

  • metrics (list[TorchMetrics] | None) – List of metrics to compute. If None, uses r2 and rmse.

Returns:

Contains scores for each metric computed for each cross validation fold.

Return type:

dict[str, list[float]]

bootstrap(model, x, y, n_bootstraps=100, n_samples=100, device='cpu', metrics=None)[source]#

Get bootstrap estimates of metrics.

Parameters:
  • model (Emulator) – An instance of an Emulator subclass.

  • x (TensorLike) – Input features for the model.

  • y (TensorLike) – Target values corresponding to the input features.

  • n_bootstraps (int | None) – Number of bootstrap samples to generate. When None the evaluation uses all all given data and returns a single value with no measure of the uncertainty. Defaults to 100.

  • n_samples (int) – Number of samples to generate to predict mean when emulator does not have a mean directly available. Defaults to 100.

  • device (str | torch.device) – The device to use for computations. Default is “cpu”.

  • metrics (list[MetricConfig] | None) – List of metrics to compute. If None, uses r2 and rmse.

Returns:

Dictionary mapping metric names to (mean, std) tuples.

Return type:

dict[str, tuple[float, float]]