Layer-wise Relevance Propagation

Explainability Transparency

Description

Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to show how much each input feature contributed to the final decision. It follows a simple conservation rule: the total contribution scores always add up to the original prediction. Starting from the output, LRP distributes 'relevance' backwards through each layer using different rules depending on the layer type. This creates a detailed breakdown showing which input features helped or hindered the prediction, making it easier to understand why the network made a particular decision.

Example Use Cases

Explainability

Identifying which pixels in chest X-rays contribute to pneumonia detection, helping radiologists verify AI diagnoses by highlighting anatomical regions the model considers relevant.

Debugging a neural network's misclassification of handwritten digits by tracing relevance through layers to identify which input pixels caused the error and which network layers amplified it.

Transparency

Providing transparent explanations for automated credit scoring decisions by showing which financial features received positive or negative relevance scores, enabling clear regulatory reporting.

Limitations

Requires different propagation rules for each layer type, making implementation complex for new architectures.
Can produce negative relevance scores which may be difficult to interpret intuitively.
Rule selection (LRP-ε, LRP-γ, etc.) significantly affects results and requires domain expertise.
Limited to feedforward networks and may not work well with modern architectures like transformers without substantial modifications.

Resources

rachtibat/LRP-eXplains-Transformers

Software Package

sebastian-lapuschkin/lrp_toolbox

Software Package

Getting started — zennit documentation

Documentation

Layer-wise Relevance Propagation eXplains Transformers (LXT) documentation

Documentation

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation

Research Paper•Sebastian Bach et al.•Jul 10, 2015

Related Techniques

Name	Description	Assurance Goals
Individual Conditional Expectation Plots	ICE plots display the predicted output for individual instances as a function of a feature, with all other features held fixed for each instance. Each line on an ICE plot represents one instance's prediction trajectory as the feature of interest changes, revealing whether different instances are affected differently by that feature.	Explainability
Model Cards	Model cards are standardised documentation frameworks that systematically document machine learning models through structured templates. The templates cover intended use cases, performance metrics across different demographic groups and operating conditions, training data characteristics, evaluation procedures, limitations, and ethical considerations. They serve as comprehensive technical specifications that enable informed model selection, prevent inappropriate deployment, support regulatory compliance, and facilitate fair assessment by providing transparent reporting of model capabilities and constraints across diverse populations and scenarios.	Transparency Fairness Safety
Occlusion Sensitivity	Occlusion sensitivity tests which parts of the input are important by occluding (masking or removing) them and seeing how the model's prediction changes. For example, portions of an image can be covered up in a sliding window fashion; if the model's confidence drops significantly when a certain region is occluded, that region was important for the prediction.	Explainability
Principal Component Analysis	Principal Component Analysis transforms high-dimensional data into a lower-dimensional representation by finding the directions (principal components) that capture the maximum variance in the data. Each component is a linear combination of original features, with the first component explaining the most variance, the second component the most remaining variance orthogonal to the first, and so on. This technique reveals underlying patterns in data structure, enables visualization of complex datasets, and helps identify which combinations of features drive the most variation in the data.	Explainability
Relabelling	A preprocessing fairness technique that modifies class labels in training data to achieve equal positive outcome rates across protected groups. Also known as 'data massaging', this method identifies instances that contribute to discriminatory patterns and flips their labels (from positive to negative or vice versa) to balance the proportion of positive outcomes between demographic groups. The technique aims to remove historical bias from training data whilst preserving the overall class distribution, enabling standard classifiers to learn from discrimination-free datasets.	Fairness Transparency Reliability
Sobol Indices	Sobol Indices quantify how much each input feature contributes to the total variance in a model's predictions through global sensitivity analysis. The technique calculates first-order indices (individual feature contributions) and total-order indices (including all interaction effects involving that feature). By systematically sampling the input space and decomposing output variance, Sobol Indices reveal which features drive model uncertainty and which interactions between features are most important for predictions.	Explainability Fairness

Tags

Applicable Models:

Data Requirements:

Access To Model Internals

Data Type:

Evidence Type:

Quantitative Metric

Expertise Needed:

Explanatory Scope:

Lifecycle Stage:

Model Development

System Deployment And Use

Technique Type: