data requirements
access to model internals
Needs access to model gradients, weights, or activations
13 techniques
Goals | Models | Data Types | Description | |||
---|---|---|---|---|---|---|
DeepLIFT | Algorithmic | Neural Network | Any | DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between... | ||
Layer-wise Relevance Propagation | Algorithmic | Neural Network | Any | Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to... | ||
Taylor Decomposition | Algorithmic | Neural Network CNN | Any | Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and... | ||
Saliency Maps | Algorithmic | Neural Network | Image | Saliency maps are visual explanations for image classification models that highlight which pixels in an image most... | ||
Gradient-weighted Class Activation Mapping | Algorithmic | CNN | Image | Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making... | ||
Classical Attention Analysis in Neural Networks | Algorithmic | Rnn CNN | Any | Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how... | ||
Temperature Scaling | Algorithmic | Neural Network | Any | Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a... | ||
Model Pruning | Algorithmic | Neural Network | Any | Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create... | ||
Neuron Activation Analysis | Algorithmic | Neural Network LLM +1 | Text | Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with... | ||
Causal Mediation Analysis in Language Models | Mechanistic Interpretability | LLM Transformer | Text | Causal mediation analysis in language models is a mechanistic interpretability technique that systematically... | ||
Concept Activation Vectors | Algorithmic | Neural Network Transformer +1 | Any | Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical... | ||
Attention Visualisation in Transformers | Algorithmic | Transformer | Text Image | Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to... | ||
Adaptive Sensitive Reweighting | Algorithmic | Model Agnostic | Any | Adaptive Sensitive Reweighting dynamically adjusts the importance of training examples during model training based on... |
Rows per page
Page 1 of 1