data requirements

access to model internals

Needs access to model gradients, weights, or activations

13 techniques
GoalsModelsData TypesDescription
DeepLIFT
Algorithmic
Neural Network
Any
DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between...
Layer-wise Relevance Propagation
Algorithmic
Neural Network
Any
Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to...
Taylor Decomposition
Algorithmic
Neural Network
CNN
Any
Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and...
Saliency Maps
Algorithmic
Neural Network
Image
Saliency maps are visual explanations for image classification models that highlight which pixels in an image most...
Gradient-weighted Class Activation Mapping
Algorithmic
CNN
Image
Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making...
Classical Attention Analysis in Neural Networks
Algorithmic
Rnn
CNN
Any
Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Temperature Scaling
Algorithmic
Neural Network
Any
Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a...
Model Pruning
Algorithmic
Neural Network
Any
Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis
Algorithmic
Neural Network
LLM
+1
Text
Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
LLM
Transformer
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors
Algorithmic
Neural Network
Transformer
+1
Any
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers
Algorithmic
Transformer
Text
Image
Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Adaptive Sensitive Reweighting
Algorithmic
Model Agnostic
Any
Adaptive Sensitive Reweighting dynamically adjusts the importance of training examples during model training based on...
Rows per page
Page 1 of 1