First-Time Users’ Frequently Asked Questions#

General Questions#

  1. What is AutoEmulate?

    • A Python package that makes it easy to build emulators for complex simulations. It takes a set of simulation inputs X and outputs y, and automatically fits, optimises and evaluates various machine learning models to find the best emulator model. The emulator model can then be used as a drop-in replacement for the simulation, but will be much faster and computationally cheaper to evaluate.

  2. How do I install AutoEmulate?

  3. What are the prerequisites for using AutoEmulate?

    • AutoEmulate is designed to be easy to use. The user has to first generate a dataset of simulation inputs X and outputs y, and optimally have a basic understanding of Python and machine learning concepts.

Usage Questions#

  1. How do I start using AutoEmulate with my simulation?

  2. What kind of data does AutoEmulate need to build an emulator?

    • AutoEmulate takes simulation inputs X and simulation outputs y to build an emulator.X is an ndarray of shape (n_samples, n_parameters) and y is an ndarray of shape (n_samples, n_outputs). Each sample here is a simulation run, so each row of X corresponds to a set of input parameters and each row of y corresponds to the corresponding simulation output. Currently, all inputs and outputs should be numeric, and we don’t support missing data.

    • All models work with multi-output data. We have optimised AutoEmulate to work with smaller datasets (in the order of hundreds to thousands of samples). Training emulators with large datasets (hundreds of thousands of samples) may currently require a long time and is not recommended.

  3. How do I interpret the results from AutoEmulate?

    • See the tutorial for an example of how to interpret the results from AutoEmulate. Briefly, X and y are first split into training and test sets. Cross-validation and/or hyperparameter optimisation are performed on the training data. After comparing the results from different emulators, the user can evaluate the chosen emulator on the test set with AutoEmulate.evaluate_model(), and plot test set predictions with AutoEmulate.plot_model(), see autoemulate.compare module for details.

    • An important thing to note is that the emulator can only be as good as the data it was trained on. Therefore, the experimental design (on which points the simulation was evaluated) is key to obtaining a good emulator.

  4. Can I use AutoEmulate for commercial purposes?

    • Yes. It’s licensed under the MIT license, which allows for commercial use. See the license for more information.

Advanced Usage#

  1. Does AutoEmulate support parallel processing or high-performance computing (HPC) environments?

    • Yes, AutoEmulate.setup() has an n_jobs parameter which allows to parallelise cross-validation and hyperparameter optimisation.

  2. Can AutoEmulate be integrated with other data analysis or simulation tools?

    • AutoEmulate takes simple X and y ndarrays as input, and returns emulator models that can be saved and loaded with joblib. All emulators are written as scikit learn estimators, so they can be used like any other scikit learn model in a pipeline.

Data Handling#

  1. What are the best practices for data preprocessing before using AutoEmulate?

    • The user will typically run their simulation on a selected set of input parameters (-> experimental design) using a latin hypercube or other sampling method. AutoEmulate currently needs all inputs to be numeric and we don’t support missing data. By default, AutoEmulate will scale the input data to zero mean and unit variance, and there’s the option to do dimensionality reduction in setup().

  2. How does AutoEmulate handle large datasets?

    • AutoEmulate is optimised to work with smaller datasets (in the order of hundreds to thousands of samples). Training emulators with large datasets (hundreds of thousands of samples) may currently require a long time and is not recommended. Emulators are created because it’s expensive to evaluate the simulation, so we expect most users to have a relatively small dataset.

Troubleshooting#

  1. What common issues might I encounter when using AutoEmulate, and how can I solve them?

    • AutoEmulate.setup() has a log_to_file option to log all warnings/errors to a file. It also has a verbose option to print more information to the console. If you encounter an error, please open an issue (see below).

  2. How can I report a bug or request a feature in AutoEmulate?

    • You can report a bug or request a new feature through the issue templates in our GitHub repository. Head on over there and choose one of the templates for your purpose and get started.

Community and Learning Resources#

  1. Are there any community projects or collaborations using AutoEmulate I can join or learn from?

    • Reach out to Martin (email) or Kalle (email) for more information.

  2. Where can I find tutorials or case studies on using AutoEmulate?

    • See the tutorial for a comprehensive guide on using the package.

  3. How can I stay updated on new releases or updates to AutoEmulate?

  4. What support options are available if I need help with AutoEmulate?

    • Please open an issue or contact the maintainer (email) directly.