applicable models

model internals

Requires access to weights, neurons, or internal representations

12 techniques
GoalsModelsData TypesDescription
Classical Attention Analysis in Neural Networks
Algorithmic
Architecture/neural Networks/recurrent
Requirements/architecture Specific
+1
Any
Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Monte Carlo Dropout
Algorithmic
Architecture/neural Networks
Paradigm/probabilistic
+4
Any
Monte Carlo Dropout estimates prediction uncertainty by applying dropout (randomly setting neural network weights to...
Monotonicity Constraints
Algorithmic
Architecture/probabilistic/gaussian Processes
Architecture/tree Based
+2
Tabular
Monotonicity constraints enforce consistent directional relationships between input features and model predictions,...
Model Pruning
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+4
Any
Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis
Algorithmic
Architecture/neural Networks
Requirements/model Internals
+1
Text
Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
Architecture/neural Networks/transformer
Architecture/neural Networks/transformer/llm
+3
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers
Algorithmic
Architecture/neural Networks/transformer
Requirements/architecture Specific
+1
Image
Text
Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Fair Adversarial Networks
Algorithmic
Architecture/neural Networks
Paradigm/discriminative
+6
Any
An in-processing fairness technique that employs adversarial training with dual neural networks to learn fair...
Fair Transfer Learning
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+4
Any
An in-processing fairness technique that adapts pre-trained models from one domain to another whilst explicitly...
Embedding Bias Analysis
Algorithmic
Architecture/neural Networks
Architecture/neural Networks/transformer
+3
Text
Image
Embedding bias analysis examines learned representations to identify biases, spurious correlations, and problematic...
Machine Unlearning
Algorithmic
Architecture/model Agnostic
Architecture/neural Networks
+2
Any
Machine unlearning enables removal of specific training data's influence from trained models without complete...
Rows per page
Page 1 of 1