{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Minimising computation time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`AutoEmulate` can be slow if the input data has many observations (rows) or many output variables. By default, `AutoEmulate` cross-validates each model, so we're computing 5 fits per models. The computation time will be relatively short for datasets up to a few thousands of datapoints, but some models (e.g. Gaussian Processes) don't scale well, so computation time might quickly become an issue. \n", "\n", "In this tutorial we walk through four strategies to speed up `AutoEmulate`:\n", "\n", "1) parallise model fits using `n_jobs` \n", "2) restrict the range of models using the `models` argument \n", "3) run fewer cross validation folds using `cross_validator` \n", "4) for hyperparameter search:\n", " - all of the above\n", " - run fewer iterations using `param_search_iters`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/mstoffel/turing/projects/autoemulate/autoemulate/compare.py:8: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n", " from tqdm.autonotebook import tqdm\n" ] } ], "source": [ "from sklearn.datasets import make_regression\n", "from autoemulate.compare import AutoEmulate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make a dataset." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((500, 10), (500, 5))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X, y = make_regression(n_samples=500, n_features=10, n_targets=5)\n", "X.shape, y.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And see how long `AutoEmulate` takes to run (without hyperparameter search)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
AutoEmulate is set up with the following settings:
" ], "text/plain": [ "\n", " | Values | \n", "
---|---|
Simulation input shape (X) | \n", "(500, 10) | \n", "
Simulation output shape (y) | \n", "(500, 5) | \n", "
Proportion of data for testing (test_set_size) | \n", "0.2 | \n", "
Scale input data (scale) | \n", "True | \n", "
Scaler (scaler) | \n", "StandardScaler | \n", "
Do hyperparameter search (param_search) | \n", "False | \n", "
Reduce dimensionality (reduce_dim) | \n", "False | \n", "
Cross validator (cross_validator) | \n", "KFold | \n", "
Parallel jobs (n_jobs) | \n", "1 | \n", "
\n", " | model | \n", "short | \n", "rmse | \n", "r2 | \n", "
---|---|---|---|---|
0 | \n", "RadialBasisFunctions | \n", "rbf | \n", "0.000011 | \n", "1.000000 | \n", "
1 | \n", "SecondOrderPolynomial | \n", "sop | \n", "0.000011 | \n", "1.000000 | \n", "
2 | \n", "GaussianProcess | \n", "gp | \n", "5.686234 | \n", "0.999072 | \n", "
3 | \n", "ConditionalNeuralProcess | \n", "cnp | \n", "12.448831 | \n", "0.995273 | \n", "
4 | \n", "SupportVectorMachines | \n", "svm | \n", "66.633470 | \n", "0.880780 | \n", "
5 | \n", "LightGBM | \n", "lgbm | \n", "71.021354 | \n", "0.869580 | \n", "
6 | \n", "GradientBoosting | \n", "gb | \n", "78.972674 | \n", "0.836586 | \n", "
7 | \n", "RandomForest | \n", "rf | \n", "112.618563 | \n", "0.666424 | \n", "