Description

DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between the actual output and a reference output back to individual input features. It compares each neuron's activation to a reference activation (typically from a baseline input like all zeros or the dataset mean) and propagates these differences backwards through the network using chain rule modifications. Unlike gradient-based methods, DeepLIFT satisfies the sensitivity property (zero input gets zero attribution) and provides more stable attributions by using discrete differences rather than gradients.

Example Use Cases

Explainability

Identifying which genomic sequences contribute to a neural network's prediction of protein binding sites, helping biologists understand regulatory mechanisms by comparing to neutral DNA baselines.

Debugging a deep learning image classifier that misclassifies medical scans by attributing the decision to specific image regions, revealing if the model focuses on irrelevant artifacts rather than pathological features.

Transparency

Providing transparent explanations for automated loan approval decisions by showing which financial features (relative to typical applicant profiles) most influenced the neural network's recommendation.

Limitations

  • Requires careful selection of reference baseline, as different choices can lead to substantially different attribution scores.
  • Implementation complexity varies significantly across different neural network architectures and layer types.
  • May produce unintuitive results when the chosen reference is not representative of the decision boundary.
  • Limited to feedforward networks and specific layer types, not suitable for all modern architectures like transformers.

Resources

Research Papers

Learning Important Features Through Propagating Activation Differences
Avanti Shrikumar, Peyton Greenside, and Anshul KundajeApr 10, 2017

The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.

Software Packages

captum
Aug 27, 2019

Model interpretability and understanding for PyTorch

Documentations

Tutorial A3: DeepLIFT/SHAP — tangermeme v0.1.0 documentation
Tangermeme DevelopersJan 1, 2023
DeepLIFT Documentation - Captum
Jan 1, 2019

Tags

Explainability Dimensions

Attribution Methods:
Representation Analysis:
Explanation Target:
Explanatory Scope:

Other Categories

Data Type:
Evidence Type:
Expertise Needed:
Technique Type: