Quickstart#

AutoEmulate’s goal is to make it easy to create an emulator for your simulation.

This tutorial’s purpose is to walk you through the the basic functionality of the Python API using simple toy simulation as example.

We’ll demonstrate following steps:

  1. Getting input and output tensor data from our example simulation

  2. Creating, comparing and evaluating Emulators with AutoEmulate

  3. Using an Emulator model to predict outputs for new inputs

  4. Saving Emulator models (and associated metadata) to disk

# General imports for the notebook
import warnings
warnings.filterwarnings("ignore")

Toy simulation#

Before we build an emulator with AutoEmulate, we need to get a set of input/output pairs from our simulation to use as training data.

Below is a toy simulation for a projectile’s motion with drag (see here for details). The simulation includes:

  • Inputs: drag coefficient (log scale), velocity

  • Outputs: distance the projectile travelled

from autoemulate.simulations.projectile import Projectile

projectile = Projectile(log_level="error")
n_samples = 500
x = projectile.sample_inputs(n_samples).float()
y = projectile.forward_batch(x).float()

x.shape, y.shape
(torch.Size([500, 2]), torch.Size([500, 1]))

Data#

As you can see, our simulator inputs (x) and outputs (y) are PyTorch tensors. PyTorch tensors are a common data structure used in machine learning, and AutoEmulate is built to work with them.

Build and compare Emulators#

With our simulator inputs and outputs, we can run a full machine learning pipeline, including data processing, model fitting, model selection and hyperparameter optimisation in just a few lines of code.

First, let’s import AutoEmulate and check the names of the available Emulator models.

from autoemulate.core.compare import AutoEmulate

AutoEmulate.list_emulators()
Emulator
0 GaussianProcess
1 GaussianProcessCorrelated
2 LightGBM
3 SupportVectorMachine
4 RadialBasisFunctions
5 RandomForest
6 MLP
7 EnsembleMLP
8 EnsembleMLPDropout

We’re now ready run AutoEmulate to build and compare emulators.

This will fit (including hyperparameter tuning) emulator models to the simulation input and output to the data, evaluating performance on witheld test data.

# Run AutoEmulate with default settings
ae = AutoEmulate(x, y, log_level="error")

For more information about the configuration options available, see the AutoEmulate API docs. Here’s a brief overview of some important options:

Model selection

By default, AutoEmulate will use of all the listed emulator models, but you can also specify a subset to use if you already know which kinds of models are suitable for your data.

Specify models used by AutoEmulate with the models argument, for example:

from autoemulate.emulators import GaussianProcessExact, RadialBasisFunctions
models = ["GaussianProcessExact", "RadialBasisFunctions"]
ae = AutoEmulate(x, y, models=models)
Logging

When running AutoEmulate, you may also wish to enable logging to track the progress and performance of the emulator comparison. You can do this by setting the log_level argument when creating the AutoEmulate instance:

ae = AutoEmulate(x, y, models=models, log_level="info")

Try setting various log levels to see the difference. The options are “progress_bar”, “debug”, “info”, “warning”, “error”, or “critical”.

Now that we have run AutoEmulate, let’s look at the summary for a comparison of emulator performance (r-squared and RMSE) on both the train and test data.

ae.summarise()
model_name x_transforms y_transforms params rmse_test r2_test r2_test_std r2_train r2_train_std
0 GaussianProcess [StandardizeTransform()] [StandardizeTransform()] {'mean_module_fn': <function linear_mean at 0x... 6.249398 0.999976 0.000012 0.999984 0.000003
1 GaussianProcessCorrelated [StandardizeTransform()] [StandardizeTransform()] {'mean_module_fn': <function constant_mean at ... 6.381134 0.999974 0.000013 0.999985 0.000003
4 RadialBasisFunctions [StandardizeTransform()] [StandardizeTransform()] {'kernel': 'quintic', 'degree': 2, 'smoothing'... 13.706507 0.999438 0.000324 0.999410 0.000109
2 LightGBM [StandardizeTransform()] [StandardizeTransform()] {'num_leaves': 35, 'max_depth': 9, 'learning_r... 26.225670 0.993062 0.002803 0.994643 0.001067
6 MLP [StandardizeTransform()] [StandardizeTransform()] {'epochs': 200, 'layer_dims': [8, 4], 'lr': 0.... 26.274155 0.992609 0.004037 0.988317 0.001896
3 SupportVectorMachine [StandardizeTransform()] [StandardizeTransform()] {'kernel': 'poly', 'degree': 5, 'gamma': 'auto... 31.012720 0.983987 0.008477 0.986727 0.002745
7 EnsembleMLP [StandardizeTransform()] [StandardizeTransform()] {'n_emulators': 4, 'epochs': 100, 'layer_dims'... 40.427025 0.962475 0.011061 0.968482 0.007252
5 RandomForest [StandardizeTransform()] [StandardizeTransform()] {'n_estimators': 117, 'min_samples_split': 4, ... 41.784657 0.950963 0.037212 0.992192 0.001342
8 EnsembleMLPDropout [StandardizeTransform()] [StandardizeTransform()] {'n_emulators': 8, 'epochs': 100, 'layer_dims'... 63.985729 0.756464 0.043905 0.773826 0.024241

Choosing an Emulator#

From this list, we can choose an emulator based on the index from the summary dataframe, or quickly get the best performing one using the best_result function, which picks based on the r2_test metric by default.

best = ae.best_result()
print("Model with id: ", best.id, " performed best: ", best.model_name)
Model with id:  0  performed best:  GaussianProcess

Let’s take a look at the configuration of the best model. These are the values of the model’s hyperparameters.

print(best.params)
{'mean_module_fn': <function linear_mean at 0x7fb4ccb05800>, 'covar_module_fn': <function rbf at 0x7fb4ccb051c0>, 'epochs': 50, 'lr': 0.5, 'likelihood_cls': <class 'gpytorch.likelihoods.multitask_gaussian_likelihood.MultitaskGaussianLikelihood'>, 'scheduler_cls': <class 'torch.optim.lr_scheduler.ExponentialLR'>, 'scheduler_kwargs': {'gamma': 0.9}}

We can quickly visualise the performance of this Emulator with a plot of its predictions against the simulator outputs for the heldout test data.

ae.plot(best)
../../_images/3fd9a786764385f84c20d13a2cacd7bb9d0e0b1c2ea7c002b650fb94eca33dca.png

Predictions#

We can use the emulator to make predictions using the predict method.

best.model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]), covariance_matrix: torch.Size([10, 1, 1]))

Saving and loading emulators#

Emulators and their metadata (hyperparameter config and performance metrics) can be saved to disk and loaded again later.

# Make a directory to save Emulator models
import os
path = "my_emulators"
if not os.path.exists(path):
    os.makedirs(path)

Let’s save the best result, the best performing emulator plus metadata, to disk.

# The use_timestamp paramater ensures a new result is saved each time the save method is called
best_result_filepath = ae.save(best, path, use_timestamp=True)
print("Model and metadata saved to: ", best_result_filepath)
Model and metadata saved to:  my_emulators/GaussianProcess_0_20250731_164113

You should now have a two files saved to disk, one with the emulator model and one with the metadata that has the same name and a .csv extension.

You can later pass this filepath to the load method to use the model (and inspect its metadata) again.

loaded_result = ae.load(best_result_filepath)
print(loaded_result.model_name)
print(loaded_result.params)
GaussianProcess
{'mean_module_fn': 'linear_mean', 'covar_module_fn': 'rbf', 'epochs': 50, 'lr': 0.5, 'likelihood_cls': 'MultitaskGaussianLikelihood', 'scheduler_cls': 'ExponentialLR', 'scheduler_kwargs': {'gamma': 0.9}}
loaded_result.model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]), covariance_matrix: torch.Size([10, 1, 1]))