Explainability

Causality

Identifies causal rather than correlational relationships

3 techniques in this subcategory

3 techniques
GoalsModelsData TypesDescription
Influence Functions
Algorithmic
Architecture/linear Models
Architecture/neural Networks
+6
Any
Influence functions quantify how much each training example influenced a model's predictions by computing the change in...
Causal Mediation Analysis in Language Models
Mechanistic Interpretability
Architecture/neural Networks/transformer
Architecture/neural Networks/transformer/llm
+3
Text
Causal mediation analysis in language models is a mechanistic interpretability technique that systematically...
Concept Activation Vectors
Algorithmic
Architecture/neural Networks
Requirements/gradient Access
+2
Any
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical...
Rows per page
Page 1 of 1