Explainability

Internal Mechanisms

Reveals how the model processes information internally

18 techniques in this subcategory

18 techniques
GoalsModelsData TypesDescription
Layer-wise Relevance Propagation
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+2
Any
Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to...
Contextual Decomposition
Algorithmic
Architecture/neural Networks/recurrent
Requirements/white Box
+1
Text
Contextual Decomposition explains LSTM and RNN predictions by decomposing the final hidden state into contributions from...
Taylor Decomposition
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and...
Gradient-weighted Class Activation Mapping
Algorithmic
Architecture/neural Networks/convolutional
Requirements/architecture Specific
+2
Image
Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making...
Classical Attention Analysis in Neural Networks
Algorithmic
Architecture/neural Networks/recurrent
Requirements/architecture Specific
+1
Any
Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how...
Factor Analysis
Algorithmic
Architecture/model Agnostic
Paradigm/unsupervised
+1
Tabular
Factor analysis is a statistical technique that identifies latent variables (hidden factors) underlying observed...
t-SNE
Visualization
Architecture/model Agnostic
Requirements/black Box
Any
t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction technique that creates 2D...
Model Distillation
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+3
Any
Model distillation transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model...
Model Pruning
Algorithmic
Architecture/neural Networks
Paradigm/parametric
+4
Any
Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create...
Neuron Activation Analysis
Algorithmic
Architecture/neural Networks
Requirements/model Internals
+1
Text
Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with...
Prompt Sensitivity Analysis
Experimental
Architecture/neural Networks/transformer/llm
Paradigm/generative
+1
Text
Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
Architecture/neural Networks/transformer
Architecture/neural Networks/transformer/llm
+3
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Attention Visualisation in Transformers
Algorithmic
Architecture/neural Networks/transformer
Requirements/architecture Specific
+1
Image
Text
Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to...
Chain-of-Thought Faithfulness Evaluation
Testing
Architecture/neural Networks/transformer/llm
Paradigm/generative
+1
Text
Chain-of-thought faithfulness evaluation assesses the quality and faithfulness of step-by-step reasoning produced by...
Embedding Bias Analysis
Algorithmic
Architecture/neural Networks
Architecture/neural Networks/transformer
+3
Text
Image
Embedding bias analysis examines learned representations to identify biases, spurious correlations, and problematic...
Multimodal Alignment Evaluation
Testing
Architecture/neural Networks
Architecture/neural Networks/transformer
+1
Image
Text
Multimodal alignment evaluation assesses whether different modalities (vision, language, audio) are synchronised and...
Retrieval-Augmented Generation Evaluation
Testing
Architecture/neural Networks/transformer/llm
Paradigm/generative
+1
Text
RAG evaluation assesses systems combining retrieval and generation by measuring retrieval quality, generation...
Rows per page
Page 1 of 1