data requirements

access to model internals

Needs access to model gradients, weights, or activations

13 techniques
GoalsModelsData TypesDescription
DeepLIFT
Algorithmic
Architecture/neural Networks
Requirements/white Box
+1
Any
DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between...
Layer-wise Relevance Propagation
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+2
Any
Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to...
Taylor Decomposition
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and...
Saliency Maps
Algorithmic
Architecture/neural Networks
Requirements/differentiable
+1
Image
Saliency maps are visual explanations for image classification models that highlight which pixels in an image most...
Gradient-weighted Class Activation Mapping
Algorithmic
Architecture/neural Networks/convolutional
Requirements/architecture Specific
+2
Image
Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making...
Classical Attention Analysis in Neural Networks
Algorithmic
Architecture/neural Networks/recurrent
Requirements/architecture Specific
+1
Any
Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Temperature Scaling
Algorithmic
Architecture/neural Networks
Paradigm/discriminative
+3
Any
Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a...
Model Pruning
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+4
Any
Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis
Algorithmic
Architecture/neural Networks
Requirements/model Internals
+1
Text
Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
Architecture/neural Networks/transformer
Architecture/neural Networks/transformer/llm
+3
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers
Algorithmic
Architecture/neural Networks/transformer
Requirements/architecture Specific
+1
Image
Text
Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Adaptive Sensitive Reweighting
Algorithmic
Architecture/model Agnostic
Paradigm/parametric
+3
Any
Adaptive Sensitive Reweighting dynamically adjusts the importance of training examples during model training based on...
Rows per page
Page 1 of 1