Taylor Decomposition

Description

Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and higher-order derivatives of the network's output with respect to input features. It decomposes the prediction into relevance scores that indicate how much each input feature and feature interaction contributes to the final decision. The method uses Layer-wise Relevance Propagation (LRP) principles to trace prediction contributions backwards through the network layers, providing precise mathematical attributions for each input element.

Example Use Cases

Explainability

Analyzing which pixels in an image contribute most to a convolutional neural network's classification decision, showing both positive and negative relevance scores for different regions of the input image.

Understanding how different word embeddings in a sentiment analysis model contribute to the final sentiment score, revealing which terms drive positive vs negative predictions.

Limitations

  • Mathematically complex requiring deep understanding of calculus and neural network architectures.
  • Computationally intensive as it requires computing gradients and higher-order derivatives through the entire network.
  • Approximations used in practice may introduce errors that affect attribution accuracy.
  • Limited tooling availability compared to other explainability methods, with most implementations being research-focused rather than production-ready.

Resources

Explaining NonLinear Classification Decisions with Deep Taylor Decomposition
Research PaperGrégoire Montavon et al.Dec 8, 2015
A Rigorous Study Of The Deep Taylor Decomposition
Research PaperLeon Sixt and Tim LandgrafNov 14, 2022
sebastian-lapuschkin/lrp_toolbox
Software Package

Tags

Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Explanatory Scope:
Lifecycle Stage:
Technique Type: