Explainability
Internal Mechanisms
Reveals how the model processes information internally
18 techniques in this subcategory
18 techniques
| Goals | Models | Data Types | Description | |||
|---|---|---|---|---|---|---|
| Layer-wise Relevance Propagation | Algorithmic | Architecture/neural Networks Paradigm/parametric +2 | Any | Layer-wise Relevance Propagation (LRP) explains neural network predictions by working backwards through the network to... | ||
| Contextual Decomposition | Algorithmic | Architecture/neural Networks/recurrent Requirements/white Box +1 | Text | Contextual Decomposition explains LSTM and RNN predictions by decomposing the final hidden state into contributions from... | ||
| Taylor Decomposition | Algorithmic | Architecture/neural Networks Requirements/gradient Access +2 | Any | Taylor Decomposition is a mathematical technique that explains neural network predictions by computing first-order and... | ||
| Gradient-weighted Class Activation Mapping | Algorithmic | Architecture/neural Networks/convolutional Requirements/architecture Specific +2 | Image | Grad-CAM creates visual heatmaps showing which regions of an image a convolutional neural network focuses on when making... | ||
| Classical Attention Analysis in Neural Networks | Algorithmic | Architecture/neural Networks/recurrent Requirements/architecture Specific +1 | Any | Classical attention mechanisms in RNNs and CNNs create alignment matrices and temporal attention patterns that show how... | ||
| Factor Analysis | Algorithmic | Architecture/model Agnostic Paradigm/unsupervised +1 | Tabular | Factor analysis is a statistical technique that identifies latent variables (hidden factors) underlying observed... | ||
| t-SNE | Visualization | Architecture/model Agnostic Requirements/black Box | Any | t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction technique that creates 2D... | ||
| Model Distillation | Algorithmic | Architecture/neural Networks Paradigm/parametric +3 | Any | Model distillation transfers knowledge from a large, complex model (teacher) to a smaller, more efficient model... | ||
| Model Pruning | Algorithmic | Architecture/neural Networks Paradigm/parametric +4 | Any | Model pruning systematically removes less important weights, neurons, or entire layers from neural networks to create... | ||
| Neuron Activation Analysis | Algorithmic | Architecture/neural Networks Requirements/model Internals +1 | Text | Neuron activation analysis examines the firing patterns of individual neurons in neural networks by probing them with... | ||
| Prompt Sensitivity Analysis | Experimental | Architecture/neural Networks/transformer/llm Paradigm/generative +1 | Text | Prompt Sensitivity Analysis systematically evaluates how variations in input prompts affect large language model... | ||
| Causal Mediation Analysis in Language Models | Mechanistic Interpretability | Architecture/neural Networks/transformer Architecture/neural Networks/transformer/llm +3 | Text | Causal mediation analysis in language models is a mechanistic interpretability technique that systematically... | ||
| Concept Activation Vectors | Algorithmic | Architecture/neural Networks Requirements/gradient Access +2 | Any | Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical... | ||
| Attention Visualisation in Transformers | Algorithmic | Architecture/neural Networks/transformer Requirements/architecture Specific +1 | Image Text | Attention Visualisation in Transformers analyses the multi-head self-attention mechanisms that enable transformers to... | ||
| Chain-of-Thought Faithfulness Evaluation | Testing | Architecture/neural Networks/transformer/llm Paradigm/generative +1 | Text | Chain-of-thought faithfulness evaluation assesses the quality and faithfulness of step-by-step reasoning produced by... | ||
| Embedding Bias Analysis | Algorithmic | Architecture/neural Networks Architecture/neural Networks/transformer +3 | Text Image | Embedding bias analysis examines learned representations to identify biases, spurious correlations, and problematic... | ||
| Multimodal Alignment Evaluation | Testing | Architecture/neural Networks Architecture/neural Networks/transformer +1 | Image Text | Multimodal alignment evaluation assesses whether different modalities (vision, language, audio) are synchronised and... | ||
| Retrieval-Augmented Generation Evaluation | Testing | Architecture/neural Networks/transformer/llm Paradigm/generative +1 | Text | RAG evaluation assesses systems combining retrieval and generation by measuring retrieval quality, generation... |
Rows per page
Page 1 of 1