Integrated Gradients

Description

Integrated Gradients is an attribution technique that explains a model's prediction by quantifying the contribution of each input feature. It works by accumulating gradients along a straight path from a user-defined baseline input (e.g., a black image or an all-zero vector) to the actual input. This path integral ensures that the attributions satisfy fundamental axioms like completeness (attributions sum up to the difference between the prediction and the baseline prediction) and sensitivity (non-zero attributions for features that change the prediction). The output is a set of importance scores, often visualised as heatmaps, indicating which parts of the input were most influential for the model's decision.

Example Use Cases

Explainability

Analysing a medical image classification model to understand which specific pixels or regions in an X-ray image contribute most to a diagnosis of pneumonia, ensuring the model focuses on relevant pathological features rather than artifacts.

Explaining the sentiment prediction of a natural language processing model by highlighting which words or phrases in a review most strongly influenced its classification as positive or negative, revealing the model's interpretative focus.

Limitations

  • Requires a carefully chosen and meaningful baseline input; an inappropriate baseline can lead to misleading or uninformative attributions.
  • The model must be differentiable, which limits its direct application to models with non-differentiable components or discrete inputs without workarounds.
  • Computationally more expensive than simple gradient-based methods, as it requires multiple gradient calculations along the integration path.
  • While satisfying completeness, the attributions can sometimes be visually noisy or difficult for humans to interpret intuitively, especially for complex inputs.

Resources

ankurtaly/Integrated-Gradients
Software Package
pytorch/captum
Software Package
Maximum Entropy Baseline for Integrated Gradients
Research PaperHanxiao TanApr 12, 2022
Integrated Gradients from Scratch | Towards Data Science
Tutorial
Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution
Research PaperGary S. W. Goh et al.Apr 22, 2020

Tags

Applicable Models:
Data Type:
Expertise Needed:
Explanatory Scope:
Lifecycle Stage:
Technique Type: