Quickstart#

AutoEmulate’s goal is to make it easy to create an emulator for your simulation.

This tutorial’s purpose is to walk you through the the basic functionality of the Python API using simple toy simulation as example.

We’ll demonstrate following steps:

  1. Getting input and output tensor data from our example simulation

  2. Creating, comparing and evaluating Emulators with AutoEmulate

  3. Using an Emulator model to predict outputs for new inputs

  4. Saving Emulator models (and associated metadata) to disk

# General imports for the notebook
import warnings
warnings.filterwarnings("ignore")

Toy simulation#

Before we build an emulator with AutoEmulate, we need to get a set of input/output pairs from our simulation to use as training data.

Below is a toy simulation for a projectile’s motion with drag (see here for details). The simulation includes:

  • Inputs: drag coefficient (log scale), velocity

  • Outputs: distance the projectile travelled

from autoemulate.simulations.projectile import Projectile

projectile = Projectile(log_level="error")
n_samples = 500
x = projectile.sample_inputs(n_samples).float()
y = projectile.forward_batch(x).float()

x.shape, y.shape
(torch.Size([500, 2]), torch.Size([500, 1]))

Data#

As you can see, our simulator inputs (x) and outputs (y) are PyTorch tensors. PyTorch tensors are a common data structure used in machine learning, and AutoEmulate is built to work with them.

Build and compare Emulators#

With our simulator inputs and outputs, we can run a full machine learning pipeline, including data processing, model fitting, model selection and hyperparameter optimisation in just a few lines of code.

First, let’s import AutoEmulate and check the names of the available Emulator models. The columns indicate whether the emulator has a PyTorch backend, supports multioutput data and provides predictive uncertainty quantification.

from autoemulate import AutoEmulate

AutoEmulate.list_emulators()
Emulator PyTorch Multioutput Uncertainty_Quantification
0 GaussianProcess True True True
1 GaussianProcessCorrelated True True True
2 LightGBM False False False
3 SupportVectorMachine False False False
4 RadialBasisFunctions True True False
5 RandomForest False True False
6 MLP True True False
7 EnsembleMLP True True True
8 EnsembleMLPDropout True True True

We’re now ready run AutoEmulate to build and compare emulators.

This will fit (including hyperparameter tuning) emulator models to the simulation input and output to the data, evaluating performance on witheld test data.

# Run AutoEmulate with default settings
ae = AutoEmulate(x, y, log_level="error")
ERROR   2025-08-20 07:50:00,292 - autoemulate - Failed after 5 with exception `max_sample` cannot be set if `bootstrap=False`. Either switch to `bootstrap=True` or set `max_sample=None`.

For more information about the configuration options available, see the AutoEmulate API docs. Here’s a brief overview of some important options:

Model selection

By default, AutoEmulate will use of all the listed emulator models, but you can also specify a subset to use if you already know which kinds of models are suitable for your data.

Specify models used by AutoEmulate with the models argument, for example:

models = ["GaussianProcessExact", "RadialBasisFunctions"]
ae = AutoEmulate(x, y, models=models)

The user can also restrict the selection to just PyTorch models or probabilistis models by using the only_pytorch or only_probabilistic arguments, respectively. For example, to use only PyTorch models:

ae = AutoEmulate(x, y, only_pytorch=True)
Logging

When running AutoEmulate, you may also wish to enable logging to track the progress and performance of the emulator comparison. You can do this by setting the log_level argument when creating the AutoEmulate instance:

ae = AutoEmulate(x, y, models=models, log_level="info")

Try setting various log levels to see the difference. The options are “progress_bar”, “debug”, “info”, “warning”, “error”, or “critical”.

Now that we have run AutoEmulate, let’s look at the summary for a comparison of emulator performance (r-squared and RMSE) on both the train and test data.

ae.summarise()
model_name x_transforms y_transforms params rmse_test r2_test r2_test_std r2_train r2_train_std
0 GaussianProcess [StandardizeTransform()] [StandardizeTransform()] {'mean_module_fn': <function poly_mean at 0x7f... 32.774654 0.999974 0.000014 0.999988 0.000003
1 GaussianProcessCorrelated [StandardizeTransform()] [StandardizeTransform()] {'mean_module_fn': <function zero_mean at 0x7f... 53.533897 0.999931 0.000043 0.999971 0.000006
7 EnsembleMLP [StandardizeTransform()] [StandardizeTransform()] {'n_emulators': 6, 'epochs': 200, 'layer_dims'... 185.216507 0.999129 0.000455 0.998950 0.000167
4 RadialBasisFunctions [StandardizeTransform()] [StandardizeTransform()] {'kernel': 'quintic', 'degree': 2, 'smoothing'... 206.747025 0.999000 0.000533 0.999180 0.000215
6 MLP [StandardizeTransform()] [StandardizeTransform()] {'epochs': 200, 'layer_dims': [32, 16], 'lr': ... 207.616653 0.998925 0.000627 0.999195 0.000155
8 EnsembleMLPDropout [StandardizeTransform()] [StandardizeTransform()] {'n_emulators': 8, 'epochs': 100, 'layer_dims'... 1116.935913 0.971232 0.008385 0.979630 0.004243
3 SupportVectorMachine [StandardizeTransform()] [StandardizeTransform()] {'kernel': 'rbf', 'degree': 2, 'gamma': 'auto'... 1349.028320 0.958179 0.011170 0.962626 0.011669
2 LightGBM [StandardizeTransform()] [StandardizeTransform()] {'num_leaves': 92, 'max_depth': 6, 'learning_r... 1568.471680 0.942532 0.029377 0.994202 0.002176
5 RandomForest [StandardizeTransform()] [StandardizeTransform()] {'n_estimators': 97, 'min_samples_split': 14, ... 1787.360474 0.936476 0.037914 0.959211 0.020246

Choosing an Emulator#

From this list, we can choose an emulator based on the index from the summary dataframe, or quickly get the best performing one using the best_result function, which picks based on the r2_test metric by default.

best = ae.best_result()
print("Model with id: ", best.id, " performed best: ", best.model_name)
Model with id:  0  performed best:  GaussianProcess
best.model.untransformed_model_name
'GaussianProcess'

Let’s take a look at the configuration of the best model. These are the values of the model’s hyperparameters.

print(best.params)
{'mean_module_fn': <function poly_mean at 0x7f2221b77600>, 'covar_module_fn': <function rq_kernel at 0x7f2221b77100>, 'epochs': 200, 'lr': 0.1, 'likelihood_cls': <class 'gpytorch.likelihoods.multitask_gaussian_likelihood.MultitaskGaussianLikelihood'>, 'scheduler_cls': <class 'torch.optim.lr_scheduler.ExponentialLR'>, 'scheduler_kwargs': {'gamma': 0.9}}

We can quickly visualise the performance of this Emulator with a plot of its predictions against the simulator outputs for the heldout test data. We also save the plot to a file.

ae.plot(best, fname="best_model_plot.png")
../../_images/50184cb8a601ae5f91be8367408b7e95b24b5e2b98febe9e5f8b6c60926f73f5.png

We can also subset the data included in the plots by providing input and output ranges.

ae.plot(best, input_ranges={0: (0, 4), 1: (200, 500)}, output_ranges={0: (0, 10)})
../../_images/769610896a7140080f098f758cdecf59ade21dd07b391b90de4cd6eb2b517db3.png

Predictions#

We can use the emulator to make predictions using the predict method.

best.model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]))

Saving and loading emulators#

Emulators and their metadata (hyperparameter config and performance metrics) can be saved to disk and loaded again later.

# Make a directory to save Emulator models
import os
path = "my_emulators"
if not os.path.exists(path):
    os.makedirs(path)

Let’s save the best result, the best performing emulator plus metadata, to disk.

# The use_timestamp paramater ensures a new result is saved each time the save method is called
best_result_filepath = ae.save(best, path, use_timestamp=True)
print("Model and metadata saved to: ", best_result_filepath)
Model and metadata saved to:  my_emulators/GaussianProcess_0_20250820_080219

You should now have a two files saved to disk, one with the emulator model and one with the metadata that has the same name and a .csv extension.

You can later pass this filepath to the load_model method to use the model again.

model = AutoEmulate.load_model(best_result_filepath)
model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]))