DeepLIFT
Description
DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between the actual output and a reference output back to individual input features. It compares each neuron's activation to a reference activation (typically from a baseline input like all zeros or the dataset mean) and propagates these differences backwards through the network using chain rule modifications. Unlike gradient-based methods, DeepLIFT satisfies the sensitivity property (zero input gets zero attribution) and provides more stable attributions by using discrete differences rather than gradients.
Example Use Cases
Explainability
Identifying which genomic sequences contribute to a neural network's prediction of protein binding sites, helping biologists understand regulatory mechanisms by comparing to neutral DNA baselines.
Debugging a deep learning image classifier that misclassifies medical scans by attributing the decision to specific image regions, revealing if the model focuses on irrelevant artifacts rather than pathological features.
Transparency
Providing transparent explanations for automated loan approval decisions by showing which financial features (relative to typical applicant profiles) most influenced the neural network's recommendation.
Limitations
- Requires careful selection of reference baseline, as different choices can lead to substantially different attribution scores.
- Implementation complexity varies significantly across different neural network architectures and layer types.
- May produce unintuitive results when the chosen reference is not representative of the decision boundary.
- Limited to feedforward networks and specific layer types, not suitable for all modern architectures like transformers.