# Quickstart

`AutoEmulate`'s goal is to make it easy to create an emulator for your simulation.

This tutorial's purpose is to walk you through the the basic functionality of the Python API using simple toy simulation as example.

We'll demonstrate following steps:
1. Getting input and output tensor data from our example simulation
2. Creating, comparing and evaluating Emulators with `AutoEmulate`
3. Using an `Emulator` model to predict outputs for new inputs
4. Saving `Emulator` models (and associated metadata) to disk

In [None]:
# General imports for the notebook
import warnings
warnings.filterwarnings("ignore")

## Toy simulation

Before we build an emulator with AutoEmulate, we need to get a set of input/output pairs from our simulation to use as training data.

Below is a toy simulation for a projectile's motion with drag (see [here](https://mogp-emulator.readthedocs.io/en/latest/intro/tutorial.html) for details). The simulation includes:
- Inputs: drag coefficient (log scale), velocity
- Outputs: distance the projectile travelled


In [None]:
from autoemulate.simulations.projectile import Projectile

projectile = Projectile(log_level="error")
n_samples = 500
x = projectile.sample_inputs(n_samples).float()
y = projectile.forward_batch(x).float()

x.shape, y.shape

### Data

As you can see, our simulator inputs (`x`) and outputs (`y`) are PyTorch tensors.
PyTorch tensors are a common data structure used in machine learning, and `AutoEmulate` is built to work with them.

## Build and compare Emulators

With our simulator inputs and outputs, we can run a full machine learning pipeline, including data processing, model fitting, model selection and hyperparameter optimisation in just a few lines of code.

First, let's import `AutoEmulate` and check the names of the available Emulator models.


In [None]:
from autoemulate.core.compare import AutoEmulate

AutoEmulate.list_emulators()

We're now ready run `AutoEmulate` to build and compare emulators.

This will fit (including hyperparameter tuning) emulator models to the simulation input and output to the data, evaluating performance on witheld test data.

In [None]:
# Run AutoEmulate with default settings
ae = AutoEmulate(x, y, log_level="error")

For more information about the configuration options available, see the [AutoEmulate API docs](https://alan-turing-institute.github.io/autoemulate/reference/index.html).
Here's a brief overview of some important options:

<details>

<summary>Model selection</summary>

By default, `AutoEmulate` will use of all the listed emulator models, but you can also specify a subset to use if you already know which kinds of models are suitable for your data.

Specify models used by AutoEmulate with the `models` argument, for example:
```python
from autoemulate.emulators import GaussianProcessExact, RadialBasisFunctions
models = ["GaussianProcessExact", "RadialBasisFunctions"]
ae = AutoEmulate(x, y, models=models)
```

</details>

<details>

<summary>Logging</summary>

When running `AutoEmulate`, you may also wish to enable logging to track the progress and performance of the emulator comparison. You can do this by setting the `log_level` argument when creating the `AutoEmulate` instance:
```python
ae = AutoEmulate(x, y, models=models, log_level="info")
```

Try setting various log levels to see the difference. The options are "progress_bar", "debug", "info", "warning", "error", or "critical".

</details>

Now that we have run `AutoEmulate`, let's look at the summary for a comparison of emulator performance (r-squared and RMSE) on both the train and test data.

In [None]:
ae.summarise()

# Choosing an Emulator

From this list, we can choose an emulator based on the index from the summary dataframe, or quickly get the best performing one using the `best_result` function, which picks based on the `r2_test` metric by default.

In [None]:
best = ae.best_result()
print("Model with id: ", best.id, " performed best: ", best.model_name)

Let's take a look at the configuration of the best model. These are the values of the model's hyperparameters.

In [None]:
print(best.params)

We can quickly visualise the performance of this Emulator with a plot of its predictions against the simulator outputs for the heldout test data.

In [None]:
ae.plot(best)

## Predictions

We can use the emulator to make predictions using the `predict` method.


In [None]:
best.model.predict(x[:10])

# Saving and loading emulators

Emulators and their metadata (hyperparameter config and performance metrics) can be saved to disk and loaded again later.

In [None]:
# Make a directory to save Emulator models
import os
path = "my_emulators"
if not os.path.exists(path):
    os.makedirs(path)

Let's save the best result, the best performing emulator plus metadata, to disk.

In [None]:
# The use_timestamp paramater ensures a new result is saved each time the save method is called
best_result_filepath = ae.save(best, path, use_timestamp=True)
print("Model and metadata saved to: ", best_result_filepath)

You should now have a two files saved to disk, one with the emulator model and one with the metadata that has the same name and a `.csv` extension.

You can later pass this filepath to the `load` method to use the model (and inspect its metadata) again.

In [None]:
loaded_result = ae.load(best_result_filepath)

In [None]:
print(loaded_result.model_name)
print(loaded_result.params)

In [None]:
loaded_result.model.predict(x[:10])