data requirements

access to model internals

Needs access to model gradients, weights, or activations

13 techniques

	Goals		Models	Data Types	Description
DeepLIFT		Algorithmic	Neural Network	Any	DeepLIFT (Deep Learning Important FeaTures) explains neural network predictions by decomposing the difference between...
Layer-wise Relevance Propagation		Algorithmic	Neural Network	Any	Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to...
Taylor Decomposition		Algorithmic	Neural Network CNN	Any	Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and...
Saliency Maps		Algorithmic	Neural Network	Image	Saliency maps are visual explanations for image classification models that highlight which pixels in an image most...
Gradient-weighted Class Activation Mapping		Algorithmic	CNN	Image	Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making...
Classical Attention Analysis in Neural Networks		Algorithmic	Rnn CNN	Any	Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Temperature Scaling		Algorithmic	Neural Network	Any	Temperature scaling adjusts a model's confidence by applying a single parameter (temperature) to its predictions. When a...
Model Pruning		Algorithmic	Neural Network	Any	Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis		Algorithmic	Neural Network LLM +1	Text	Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Causal Mediation Analysis in Language Models		Mechanistic Interpretability	LLM Transformer	Text	Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors		Algorithmic	Neural Network Transformer +1	Any	Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers		Algorithmic	Transformer	Text Image	Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Adaptive Sensitive Reweighting		Algorithmic	Model Agnostic	Any	Adaptive Sensitive Reweighting dynamically adjusts the importance of training examples during model training based on...

Rows per page

Page 1 of 1

← Back to data requirements|All filters