applicable models

model internals

Requires access to weights, neurons, or internal representations

12 techniques

	Goals		Models	Data Types	Description
Classical Attention Analysis in Neural Networks		Algorithmic	Architecture/neural Networks/recurrent Requirements/architecture Specific +1	Any	Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Monte Carlo Dropout		Algorithmic	Architecture/neural Networks Paradigm/probabilistic +4	Any	Monte Carlo Dropout estimates prediction uncertainty by applying dropout (randomly setting neural network weights to...
Monotonicity Constraints		Algorithmic	Architecture/neural Networks Architecture/probabilistic/gaussian Processes +3	Tabular	Monotonicity constraints enforce consistent directional relationships between input features and model predictions,...
Model Pruning		Algorithmic	Architecture/neural Networks Paradigm/parametric +4	Any	Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis		Algorithmic	Architecture/neural Networks Requirements/model Internals +1	Text	Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Causal Mediation Analysis in Language Models		Mechanistic Interpretability	Architecture/neural Networks/transformer Architecture/neural Networks/transformer/llm +3	Text	Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors		Algorithmic	Architecture/neural Networks Requirements/gradient Access +2	Any	Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers		Algorithmic	Architecture/neural Networks/transformer Requirements/architecture Specific +1	Image Text	Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Fair Adversarial Networks		Algorithmic	Architecture/neural Networks Paradigm/discriminative +6	Any	An in-processing fairness technique that employs adversarial training with dual neural networks to learn fair...
Fair Transfer Learning		Algorithmic	Architecture/neural Networks Paradigm/parametric +4	Any	An in-processing fairness technique that adapts pre-trained models from one domain to another whilst explicitly...
Embedding Bias Analysis		Algorithmic	Architecture/neural Networks Architecture/neural Networks/transformer +3	Text Image	Embedding bias analysis examines learned representations to identify biases, spurious correlations, and problematic...
Machine Unlearning		Algorithmic	Architecture/model Agnostic Architecture/neural Networks +2	Any	Machine unlearning enables removal of specific training data's influence from trained models without complete...

Rows per page

Page 1 of 1

← Back to applicable models|All filters