Concept Activation Vectors
Description
Concept Activation Vectors (CAVs), also known as Testing with Concept Activation Vectors (TCAV), identify mathematical directions in neural network representation space that correspond to human-understandable concepts such as 'stripes', 'young', or 'medical equipment'. The technique works by finding linear directions that separate activations of concept examples from non-concept examples, then measuring how much these concept directions influence the model's predictions. This provides quantitative answers to questions like 'How much does the concept of youth affect this model's hiring decisions?' enabling systematic bias detection and model understanding.
Example Use Cases
Explainability
Auditing a medical imaging model to verify it focuses on diagnostic features (like 'tumour characteristics') rather than irrelevant concepts (like 'scanner type' or 'patient positioning') when classifying chest X-rays, ensuring clinical decisions rely on medically relevant information.
Fairness
Testing whether a hiring algorithm's resume screening decisions are influenced by concepts related to protected characteristics such as 'gender-associated names', 'prestigious universities', or 'employment gaps', enabling systematic bias detection and compliance verification.
Transparency
Providing regulatory-compliant explanations for financial lending decisions by quantifying how concepts like 'debt-to-income ratio', 'employment stability', and 'credit history length' influence loan approval models, with precise sensitivity scores for audit documentation.
Limitations
- Requires clearly defined concept examples and non-concept examples, which can be challenging to obtain for abstract or subjective concepts.
- Assumes that meaningful concept directions exist as linear separable directions in the model's internal representation space, which may not hold for all concepts.
- Results depend heavily on which network layer is examined, as different layers capture different levels of abstraction and concept representation.
- Computational cost grows significantly with model size and number of concepts tested, though recent advances like FastCAV address this limitation.
- Interpretation requires domain expertise to define meaningful concepts and understand the significance of sensitivity scores in practical contexts.