Quickstart#
AutoEmulate
’s goal is to make it easy to create an emulator for your simulation.
This tutorial’s purpose is to walk you through the the basic functionality of the Python API using simple toy simulation as example.
We’ll demonstrate following steps:
Getting input and output tensor data from our example simulation
Creating, comparing and evaluating Emulators with
AutoEmulate
Using an
Emulator
model to predict outputs for new inputsSaving
Emulator
models (and associated metadata) to disk
# General imports for the notebook
import warnings
warnings.filterwarnings("ignore")
Toy simulation#
Before we build an emulator with AutoEmulate, we need to get a set of input/output pairs from our simulation to use as training data.
Below is a toy simulation for a projectile’s motion with drag (see here for details). The simulation includes:
Inputs: drag coefficient (log scale), velocity
Outputs: distance the projectile travelled
from autoemulate.simulations.projectile import Projectile
projectile = Projectile(log_level="error")
n_samples = 500
x = projectile.sample_inputs(n_samples).float()
y = projectile.forward_batch(x).float()
x.shape, y.shape
(torch.Size([500, 2]), torch.Size([500, 1]))
Data#
As you can see, our simulator inputs (x
) and outputs (y
) are PyTorch tensors.
PyTorch tensors are a common data structure used in machine learning, and AutoEmulate
is built to work with them.
Build and compare Emulators#
With our simulator inputs and outputs, we can run a full machine learning pipeline, including data processing, model fitting, model selection and hyperparameter optimisation in just a few lines of code.
First, let’s import AutoEmulate
and check the names of the available Emulator models.
from autoemulate.core.compare import AutoEmulate
AutoEmulate.list_emulators()
Emulator | |
---|---|
0 | GaussianProcess |
1 | GaussianProcessCorrelated |
2 | LightGBM |
3 | SupportVectorMachine |
4 | RadialBasisFunctions |
5 | RandomForest |
6 | MLP |
7 | EnsembleMLP |
8 | EnsembleMLPDropout |
We’re now ready run AutoEmulate
to build and compare emulators.
This will fit (including hyperparameter tuning) emulator models to the simulation input and output to the data, evaluating performance on witheld test data.
# Run AutoEmulate with default settings
ae = AutoEmulate(x, y, log_level="error")
For more information about the configuration options available, see the AutoEmulate API docs. Here’s a brief overview of some important options:
Model selection
By default, AutoEmulate
will use of all the listed emulator models, but you can also specify a subset to use if you already know which kinds of models are suitable for your data.
Specify models used by AutoEmulate with the models
argument, for example:
from autoemulate.emulators import GaussianProcessExact, RadialBasisFunctions
models = ["GaussianProcessExact", "RadialBasisFunctions"]
ae = AutoEmulate(x, y, models=models)
Logging
When running AutoEmulate
, you may also wish to enable logging to track the progress and performance of the emulator comparison. You can do this by setting the log_level
argument when creating the AutoEmulate
instance:
ae = AutoEmulate(x, y, models=models, log_level="info")
Try setting various log levels to see the difference. The options are “progress_bar”, “debug”, “info”, “warning”, “error”, or “critical”.
Now that we have run AutoEmulate
, let’s look at the summary for a comparison of emulator performance (r-squared and RMSE) on both the train and test data.
ae.summarise()
model_name | x_transforms | y_transforms | params | rmse_test | r2_test | r2_test_std | r2_train | r2_train_std | |
---|---|---|---|---|---|---|---|---|---|
0 | GaussianProcess | [StandardizeTransform()] | [StandardizeTransform()] | {'mean_module_fn': <function linear_mean at 0x... | 6.249398 | 0.999976 | 0.000012 | 0.999984 | 0.000003 |
1 | GaussianProcessCorrelated | [StandardizeTransform()] | [StandardizeTransform()] | {'mean_module_fn': <function constant_mean at ... | 6.381134 | 0.999974 | 0.000013 | 0.999985 | 0.000003 |
4 | RadialBasisFunctions | [StandardizeTransform()] | [StandardizeTransform()] | {'kernel': 'quintic', 'degree': 2, 'smoothing'... | 13.706507 | 0.999438 | 0.000324 | 0.999410 | 0.000109 |
2 | LightGBM | [StandardizeTransform()] | [StandardizeTransform()] | {'num_leaves': 35, 'max_depth': 9, 'learning_r... | 26.225670 | 0.993062 | 0.002803 | 0.994643 | 0.001067 |
6 | MLP | [StandardizeTransform()] | [StandardizeTransform()] | {'epochs': 200, 'layer_dims': [8, 4], 'lr': 0.... | 26.274155 | 0.992609 | 0.004037 | 0.988317 | 0.001896 |
3 | SupportVectorMachine | [StandardizeTransform()] | [StandardizeTransform()] | {'kernel': 'poly', 'degree': 5, 'gamma': 'auto... | 31.012720 | 0.983987 | 0.008477 | 0.986727 | 0.002745 |
7 | EnsembleMLP | [StandardizeTransform()] | [StandardizeTransform()] | {'n_emulators': 4, 'epochs': 100, 'layer_dims'... | 40.427025 | 0.962475 | 0.011061 | 0.968482 | 0.007252 |
5 | RandomForest | [StandardizeTransform()] | [StandardizeTransform()] | {'n_estimators': 117, 'min_samples_split': 4, ... | 41.784657 | 0.950963 | 0.037212 | 0.992192 | 0.001342 |
8 | EnsembleMLPDropout | [StandardizeTransform()] | [StandardizeTransform()] | {'n_emulators': 8, 'epochs': 100, 'layer_dims'... | 63.985729 | 0.756464 | 0.043905 | 0.773826 | 0.024241 |
Choosing an Emulator#
From this list, we can choose an emulator based on the index from the summary dataframe, or quickly get the best performing one using the best_result
function, which picks based on the r2_test
metric by default.
best = ae.best_result()
print("Model with id: ", best.id, " performed best: ", best.model_name)
Model with id: 0 performed best: GaussianProcess
Let’s take a look at the configuration of the best model. These are the values of the model’s hyperparameters.
print(best.params)
{'mean_module_fn': <function linear_mean at 0x7fb4ccb05800>, 'covar_module_fn': <function rbf at 0x7fb4ccb051c0>, 'epochs': 50, 'lr': 0.5, 'likelihood_cls': <class 'gpytorch.likelihoods.multitask_gaussian_likelihood.MultitaskGaussianLikelihood'>, 'scheduler_cls': <class 'torch.optim.lr_scheduler.ExponentialLR'>, 'scheduler_kwargs': {'gamma': 0.9}}
We can quickly visualise the performance of this Emulator with a plot of its predictions against the simulator outputs for the heldout test data.
ae.plot(best)

Predictions#
We can use the emulator to make predictions using the predict
method.
best.model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]), covariance_matrix: torch.Size([10, 1, 1]))
Saving and loading emulators#
Emulators and their metadata (hyperparameter config and performance metrics) can be saved to disk and loaded again later.
# Make a directory to save Emulator models
import os
path = "my_emulators"
if not os.path.exists(path):
os.makedirs(path)
Let’s save the best result, the best performing emulator plus metadata, to disk.
# The use_timestamp paramater ensures a new result is saved each time the save method is called
best_result_filepath = ae.save(best, path, use_timestamp=True)
print("Model and metadata saved to: ", best_result_filepath)
Model and metadata saved to: my_emulators/GaussianProcess_0_20250731_164113
You should now have a two files saved to disk, one with the emulator model and one with the metadata that has the same name and a .csv
extension.
You can later pass this filepath to the load
method to use the model (and inspect its metadata) again.
loaded_result = ae.load(best_result_filepath)
print(loaded_result.model_name)
print(loaded_result.params)
GaussianProcess
{'mean_module_fn': 'linear_mean', 'covar_module_fn': 'rbf', 'epochs': 50, 'lr': 0.5, 'likelihood_cls': 'MultitaskGaussianLikelihood', 'scheduler_cls': 'ExponentialLR', 'scheduler_kwargs': {'gamma': 0.9}}
loaded_result.model.predict(x[:10])
MultivariateNormal(loc: torch.Size([10, 1]), covariance_matrix: torch.Size([10, 1, 1]))