Description

Sobol Indices quantify how much each input feature contributes to the total variance in a model's predictions through global sensitivity analysis. The technique calculates first-order indices (individual feature contributions) and total-order indices (including all interaction effects involving that feature). By systematically sampling the input space and decomposing output variance, Sobol Indices reveal which features drive model uncertainty and which interactions between features are most important for predictions.

Example Use Cases

Explainability

Analysing a climate prediction model to determine which atmospheric parameters (temperature, humidity, pressure) contribute most to rainfall forecast uncertainty, helping meteorologists understand which measurements need the highest precision.

Evaluating a financial risk model to identify which economic indicators (interest rates, inflation, GDP growth) drive the most variability in portfolio value predictions, enabling better risk management strategies.

Fairness

Analysing a credit scoring model to quantify how much prediction variance stems from zip code (a potential proxy for race), helping identify features that may cause disparate impact across demographic groups.

Limitations

  • Computationally expensive, requiring thousands of model evaluations to achieve stable variance estimates, making it impractical for very slow models.
  • Assumes input features are independently distributed, which can lead to misleading results when features are correlated in real data.
  • Curse of dimensionality makes the technique increasingly difficult and expensive to apply as the number of input features grows beyond 10-20.
  • Requires defining appropriate probability distributions for input features, which may not accurately reflect real-world feature distributions.

Resources

Research Papers

Sobol Tensor Trains for Global Sensitivity Analysis
Rafael Ballester-Ripoll, Enrique G. Paredes, and Renato PajarolaDec 1, 2017

Sobol indices are a widespread quantitative measure for variance-based global sensitivity analysis, but computing and utilizing them remains challenging for high-dimensional systems. We propose the tensor train decomposition (TT) as a unified framework for surrogate modeling and global sensitivity analysis via Sobol indices. We first overview several strategies to build a TT surrogate of the unknown true model using either an adaptive sampling strategy or a predefined set of samples. We then introduce and derive the Sobol tensor train, which compactly represents the Sobol indices for all possible joint variable interactions which are infeasible to compute and store explicitly. Our formulation allows efficient aggregation and subselection operations: we are able to obtain related indices (closed, total, and superset indices) at negligible cost. Furthermore, we exploit an existing global optimization procedure within the TT framework for variable selection and model analysis tasks. We demonstrate our algorithms with two analytical engineering models and a parallel computing simulation data set.

Software Packages

UQpy
Dec 1, 2017

UQpy (Uncertainty Quantification with python) is a general purpose Python toolbox for modeling uncertainty in physical and mathematical systems.

SALib
May 30, 2013

Sensitivity Analysis Library in Python. Contains Sobol, Morris, FAST, and other methods.

Tutorials

Sobol Indices to Measure Feature Importance | Towards Data Science
Valentin CatherineJun 20, 2022

Documentations

Sobol indices — UQpy v4.2.0 documentation
Uqpyproject DevelopersJan 1, 2017
Basics — SALib's documentation
Salib DevelopersJan 1, 2025

Tags

Explainability Dimensions

Attribution Methods:
Causal Analysis:
Properties:
Explanatory Scope:

Other Categories

Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Lifecycle Stage:
Technique Type:
Applicable Models: