First-Time Users’ Frequently Asked Questions#
General Questions#
What is
AutoEmulate
?A Python package that makes it easy to create emulators for complex simulations. It takes a set of simulation inputs
X
and outputsy
, and automatically fits, optimises and evaluates various machine learning models to find the best emulator model. The emulator model can then be used as a drop-in replacement for the simulation, but will be much faster and computationally cheaper to evaluate. We have also implemented global sensitivity analysis as a common emulator application and working towards makingAutoEmulate
a true end-to-end package for building emulators.
How do I know whether
AutoEmulate
is the right tool for me?You need to build an emulator for a simulation.
You want to do global sensitivity analysis
Your inputs
X
and outputsy
are numeric and complete (we don’t support missing data yet).You have one or more input parameters and one or more output variables.
You have a small-ish dataset in the order of hundreds to few thousands of samples. All default emulator parameters and search spaces are optimised for smaller datasets.
Does
AutoEmulate
support multi-output data?Yes, all models support multi-output data. Some do so natively, others are wrapped in a
MultiOutputRegressor
, which fits one model per target variable.
Does
AutoEmulate
support temporal or spatial data?Not explicitly. The train-test split just takes a random subset as a test set, so does KFold cross-validation.
AutoEmulate
takes a long time to run on my dataset, why?The package fits a lot of models, in particular when hyperparameters are optimised. With say 8 default models and 5-fold cross-validation, this amounts to 40 model fits. With the addition of hyperparameter optimisation (n_iter=20), this results in 800 model fits. Some models such as Gaussian Processes and Neural Processes will take a long time to run on a CPU. However, don’t despair! There is a speeding up AutoEmulate guide. As a rule of thumb, if your dataset is smaller than 1000 samples, you should be fine, if it’s larger and you want to optimise hyperparameters, you might want to read the guide.
Usage Questions#
What data do I need to provide to
AutoEmulate
to build an emulator?You’ll need two input objects:
X
andy
.X
is an ndarray / Pandas DataFrame of shape(n_samples, n_parameters)
andy
is an ndarray / Pandas DataFrame of shape(n_samples, n_outputs)
. Each sample here is a simulation run, so each row ofX
corresponds to a set of input parameters and each row ofy
corresponds to the corresponding simulation output. You’ll usually have createdX
using Latin hypercube sampling or similar methods, andy
by running the simulation on theseX
inputs.
Can I use
AutoEmulate
for commercial purposes?Yes. It’s licensed under the MIT license, which allows for commercial use. See the license for more information.
Advanced Usage#
Does AutoEmulate support parallel processing or high-performance computing (HPC) environments?
Yes, AutoEmulate.setup() has an
n_jobs
parameter which allows to parallelise cross-validation and hyperparameter optimisation. We are also working on GPU support for some models.
Can AutoEmulate be integrated with other data analysis or simulation tools?
AutoEmulate
takes simpleX
andy
ndarrays as input, and returns emulators which are scikit-learn estimators, that can be saved and loaded, and used like any other scikit-learn model.
Data Handling#
What are the best practices for data preprocessing before using
AutoEmulate
?The user will typically run their simulation on a selected set of input parameters (-> experimental design) using a latin hypercube or other sampling method.
AutoEmulate
currently needs all inputs to be numeric and we don’t support missing data. By default,AutoEmulate
will scale the input data to zero mean and unit variance, and for some models it will also scale the output data. There’s also the option to do dimensionality reduction insetup()
.
Troubleshooting#
What common issues might I encounter when using
AutoEmulate
, and how can I solve them?AutoEmulate.setup()
has alog_to_file
option to log all warnings/errors to a file. It also has averbose
option to print more information to the console. If you encounter an error, please open an issue (see below).One common issue is that the Jupyter notebook kernel crashes when running
compare()
in parallel, often due toLightGBM
. In this case, we recommend either specifyingn_jobs=1
or selecting specific (non-LightGBM) models insetup()
with themodels
parameter.
How can I report a bug or request a feature in
AutoEmulate
?You can report a bug or request a new feature through the issue templates in our GitHub repository. Head on over there and choose one of the templates for your purpose and get started.
Community and Learning Resources#
Are there any community projects or collaborations using
AutoEmulate
I can join or learn from?Where can I find tutorials or case studies on using
AutoEmulate
?See the tutorial for a comprehensive guide on using the package. Case studies are coming soon.
How can I stay updated on new releases or updates to AutoEmulate?
Watch the AutoEmulate repository.
What support options are available if I need help with AutoEmulate?
Please open an issue on GitHub or contact the maintainer (email) directly.