Sobol Indices

Description

Sobol Indices quantify how much each input feature contributes to the total variance in a model's predictions through global sensitivity analysis. The technique calculates first-order indices (individual feature contributions) and total-order indices (including all interaction effects involving that feature). By systematically sampling the input space and decomposing output variance, Sobol Indices reveal which features drive model uncertainty and which interactions between features are most important for predictions.

Example Use Cases

Explainability

Analysing a climate prediction model to determine which atmospheric parameters (temperature, humidity, pressure) contribute most to rainfall forecast uncertainty, helping meteorologists understand which measurements need the highest precision.

Evaluating a financial risk model to identify which economic indicators (interest rates, inflation, GDP growth) drive the most variability in portfolio value predictions, enabling better risk management strategies.

Fairness

Analysing a credit scoring model to quantify how much prediction variance stems from zip code (a potential proxy for race), helping identify features that may cause disparate impact across demographic groups.

Limitations

Computationally expensive, requiring thousands of model evaluations to achieve stable variance estimates, making it impractical for very slow models.
Assumes input features are independently distributed, which can lead to misleading results when features are correlated in real data.
Curse of dimensionality makes the technique increasingly difficult and expensive to apply as the number of input features grows beyond 10-20.
Requires defining appropriate probability distributions for input features, which may not accurately reflect real-world feature distributions.

Resources

Research Papers

Sobol Tensor Trains for Global Sensitivity Analysis

Rafael Ballester-Ripoll, Enrique G. Paredes, and Renato Pajarola•Dec 1, 2017

Sobol indices are a widespread quantitative measure for variance-based global sensitivity analysis, but computing and utilizing them remains challenging for high-dimensional systems. We propose the tensor train decomposition (TT) as a unified framework for surrogate modeling and global sensitivity analysis via Sobol indices. We first overview several strategies to build a TT surrogate of the unknown true model using either an adaptive sampling strategy or a predefined set of samples. We then introduce and derive the Sobol tensor train, which compactly represents the Sobol indices for all possible joint variable interactions which are infeasible to compute and store explicitly. Our formulation allows efficient aggregation and subselection operations: we are able to obtain related indices (closed, total, and superset indices) at negligible cost. Furthermore, we exploit an existing global optimization procedure within the TT framework for variable selection and model analysis tasks. We demonstrate our algorithms with two analytical engineering models and a parallel computing simulation data set.

Software Packages

UQpy

Dec 1, 2017

UQpy (Uncertainty Quantification with python) is a general purpose Python toolbox for modeling uncertainty in physical and mathematical systems.

SALib

May 30, 2013

Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods.

Tutorials

Sobol Indices to Measure Feature Importance | Towards Data Science

Valentin Catherine•Jun 20, 2022

Documentations

Sobol indices — UQpy v4.2.0 documentation

Uqpyproject Developers•Jan 1, 2017

Basics — SALib's documentation

Salib Developers•Jan 1, 2025

Related Techniques

Name	Description	Assurance Goals
SHapley Additive exPlanations	SHAP explains model predictions by quantifying how much each input feature contributes to the outcome. It assigns an importance score to every feature, indicating whether it pushes the prediction towards or away from the average. The method systematically evaluates how predictions change as features are included or excluded, drawing on game theory concepts to ensure a fair distribution of contributions.	Explainability Fairness Reliability
Permutation Importance	Permutation Importance quantifies a feature's contribution to a model's performance by randomly shuffling its values and measuring the resulting drop in predictive accuracy. If shuffling a feature significantly degrades the model's performance, that feature is considered important. This model-agnostic technique helps identify which inputs are genuinely driving predictions, rather than just being correlated with the outcome.	Explainability Reliability
Partial Dependence Plots	Partial Dependence Plots show how changing one or two features affects a model's predictions on average. The technique works by varying the selected feature(s) across their full range whilst keeping all other features fixed at their original values, then averaging the predictions. This creates a clear visualisation of whether increasing or decreasing a feature tends to increase or decrease predictions, and reveals patterns like linear trends, plateaus, or threshold effects that help explain model behaviour.	Explainability
Sensitivity Analysis for Fairness	Sensitivity Analysis for Fairness systematically evaluates how model predictions change when sensitive attributes or their proxies are perturbed whilst holding other factors constant. The technique involves creating counterfactual instances by modifying potentially discriminatory features (race, gender, age) or their correlates (zip code, names, education institutions) and measuring the resulting prediction differences. This controlled perturbation approach quantifies the degree to which protected characteristics influence model decisions, helping detect both direct discrimination and indirect bias through proxy variables even when sensitive attributes are not explicitly used as model inputs.	Fairness