First-Time Users’ Frequently Asked Questions#
General Questions#
What is
AutoEmulate
?A Python package that makes it easy to create emulators for complex simulations. It takes a set of simulation inputs
X
and outputsy
, and automatically fits, optimises and evaluates various machine learning models to find the best emulator model. The emulator model can then be used as a drop-in replacement for the simulation, but will be much faster and computationally cheaper to evaluate. We have also implemented global sensitivity analysis as a common emulator application and working towards makingAutoEmulate
a true end-to-end package for building emulators.
How do I know whether
AutoEmulate
is the right tool for me?You need to build an emulator for a simulation.
You want to do global sensitivity analysis
Your inputs
x
and outputsy
are numeric and complete (we don’t support missing data yet).You have one or more input parameters and one or more output variables.
You have a small-ish dataset in the order of hundreds to few thousands of samples. All default emulator parameters and search spaces are optimised for smaller datasets.
Does
AutoEmulate
support multi-output data?Yes, some models support multi-output data. When instantiating
AutoEmulate
with multioutput data, the tool will automatically restrict the search space to models that can handle it.
Does
AutoEmulate
support temporal or spatial data?Not explicitly. AutoEmulate currently expects 2D data
[n_simulations, n_outputs]
. The second dimension could be temporal or spatial indexes but it will not explicitly model spatial or temporal correlations. This is a feature we hope to add in the future.
AutoEmulate
takes a long time to run on my dataset, why?The package fits a lot of models, in particular when hyperparameters are optimised. With say 8 default models and 5-fold cross-validation, this amounts to 40 model fits. With the addition of hyperparameter optimisation (n_iter=20), this results in 800 model fits. Some models such as Gaussian Processes will take a long time to run on a CPU.
Usage Questions#
What data do I need to provide to
AutoEmulate
to build an emulator?You’ll need two input objects:
x
andy
.x
is an ndarray / torch tensor of shape(n_samples, n_parameters)
andy
is an ndarray / torch tensor of shape(n_samples, n_outputs)
. Each sample here is a simulation run, so each row ofx
corresponds to a set of input parameters and each row ofy
corresponds to the corresponding simulation output. You’ll usually have createdx
using Latin hypercube sampling or similar methods, andy
by running the simulation on thesex
inputs.
Can I use
AutoEmulate
for commercial purposes?Yes. It’s licensed under the MIT license, which allows for commercial use. See the license for more information.
Data Handling#
What are the best practices for data preprocessing before using
AutoEmulate
?The user will typically run their simulation on a selected set of input parameters (-> experimental design) using a latin hypercube or other sampling method.
AutoEmulate
currently needs all inputs to be numeric and we don’t support missing data. By default,AutoEmulate
will scale the input and output data to zero mean and unit variance. There’s also the option to do dimensionality reduction (see the dimensionality reduction tutorial).
Community and Learning Resources#
Where can I find tutorials or case studies on using
AutoEmulate
?See the tutorial for a comprehensive guide on using the package. Case studies are coming soon.
How can I stay updated on new releases or updates to AutoEmulate?
Watch the AutoEmulate repository.
What support options are available if I need help with AutoEmulate?
Please open an issue or start a discussion on GitHub.