applicable models
model internals
Requires access to weights, neurons, or internal representations
12 techniques
| Goals | Models | Data Types | Description | |||
|---|---|---|---|---|---|---|
| Classical Attention Analysis in Neural Networks | Algorithmic | Architecture/neural Networks/recurrent Requirements/architecture Specific +1 | Any | Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how... | ||
| Monte Carlo Dropout | Algorithmic | Architecture/neural Networks Paradigm/probabilistic +4 | Any | Monte Carlo Dropout estimates prediction uncertainty by applying dropout (randomly setting neural network weights to... | ||
| Monotonicity Constraints | Algorithmic | Architecture/probabilistic/gaussian Processes Architecture/tree Based +2 | Tabular | Monotonicity constraints enforce consistent directional relationships between input features and model predictions,... | ||
| Model Pruning | Algorithmic | Architecture/neural Networks Paradigm/parametric +4 | Any | Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create... | ||
| Neuron Activation Analysis | Algorithmic | Architecture/neural Networks Requirements/model Internals +1 | Text | Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with... | ||
| Causal Mediation Analysis in Language Models | Mechanistic Interpretability | Architecture/neural Networks/transformer Architecture/neural Networks/transformer/llm +3 | Text | Causal mediation analysis in language models is a mechanistic interpretability technique that systematically... | ||
| Concept Activation Vectors | Algorithmic | Architecture/neural Networks Requirements/gradient Access +2 | Any | Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical... | ||
| Attention Visualisation in Transformers | Algorithmic | Architecture/neural Networks/transformer Requirements/architecture Specific +1 | Image Text | Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to... | ||
| Fair Adversarial Networks | Algorithmic | Architecture/neural Networks Paradigm/discriminative +6 | Any | An in-processing fairness technique that employs adversarial training with dual neural networks to learn fair... | ||
| Fair Transfer Learning | Algorithmic | Architecture/neural Networks Paradigm/parametric +4 | Any | An in-processing fairness technique that adapts pre-trained models from one domain to another whilst explicitly... | ||
| Embedding Bias Analysis | Algorithmic | Architecture/neural Networks Architecture/neural Networks/transformer +3 | Text Image | Embedding bias analysis examines learned representations to identify biases, spurious correlations, and problematic... | ||
| Machine Unlearning | Algorithmic | Architecture/model Agnostic Architecture/neural Networks +2 | Any | Machine unlearning enables removal of specific training data's influence from trained models without complete... |
Rows per page
Page 1 of 1