Sobol Indices
Description
Sobol Indices quantify how much each input feature contributes to the total variance in a model's predictions through global sensitivity analysis. The technique calculates first-order indices (individual feature contributions) and total-order indices (including all interaction effects involving that feature). By systematically sampling the input space and decomposing output variance, Sobol Indices reveal which features drive model uncertainty and which interactions between features are most important for predictions.
Example Use Cases
Explainability
Analysing a climate prediction model to determine which atmospheric parameters (temperature, humidity, pressure) contribute most to rainfall forecast uncertainty, helping meteorologists understand which measurements need the highest precision.
Evaluating a financial risk model to identify which economic indicators (interest rates, inflation, GDP growth) drive the most variability in portfolio value predictions, enabling better risk management strategies.
Fairness
Analysing a credit scoring model to quantify how much prediction variance stems from zip code (a potential proxy for race), helping identify features that may cause disparate impact across demographic groups.
Limitations
- Computationally expensive, requiring thousands of model evaluations to achieve stable variance estimates, making it impractical for very slow models.
- Assumes input features are independently distributed, which can lead to misleading results when features are correlated in real data.
- Curse of dimensionality makes the technique increasingly difficult and expensive to apply as the number of input features grows beyond 10-20.
- Requires defining appropriate probability distributions for input features, which may not accurately reflect real-world feature distributions.
Resources
Research Papers
Sobol Tensor Trains for Global Sensitivity Analysis
Sobol indices are a widespread quantitative measure for variance-based global sensitivity analysis, but computing and utilizing them remains challenging for high-dimensional systems. We propose the tensor train decomposition (TT) as a unified framework for surrogate modeling and global sensitivity analysis via Sobol indices. We first overview several strategies to build a TT surrogate of the unknown true model using either an adaptive sampling strategy or a predefined set of samples. We then introduce and derive the Sobol tensor train, which compactly represents the Sobol indices for all possible joint variable interactions which are infeasible to compute and store explicitly. Our formulation allows efficient aggregation and subselection operations: we are able to obtain related indices (closed, total, and superset indices) at negligible cost. Furthermore, we exploit an existing global optimization procedure within the TT framework for variable selection and model analysis tasks. We demonstrate our algorithms with two analytical engineering models and a parallel computing simulation data set.