Description

Influence functions quantify how much each training example influenced a model's predictions by computing the change in prediction that would occur if that training example were removed and the model retrained. Using calculus and the implicit function theorem, they approximate this 'leave-one-out' effect without actually retraining the model by computing gradients and Hessian information. This mathematical approach reveals which specific training examples were most responsible for pushing the model toward or away from particular predictions, enabling practitioners to trace problematic outputs back to their root causes in the training data.

Example Use Cases

Explainability

Investigating why a medical diagnosis model misclassified a patient by identifying which specific training cases most influenced the incorrect prediction, revealing potential mislabelled examples or problematic patterns in the training data.

Analysing a spam detection system that falsely flagged legitimate emails by tracing the prediction back to influential training examples, discovering that certain training emails contained misleading patterns that caused the model to overfit.

Fairness

Auditing a loan approval model for discriminatory patterns by identifying which training examples most influenced rejections of minority applicants, revealing whether biased historical decisions are driving current unfair outcomes.

Privacy

Assessing membership inference risks in a medical model by identifying whether certain patient records have disproportionate influence on predictions, indicating potential data leakage vulnerabilities.

Limitations

  • Computationally intensive, requiring Hessian matrix computations that become intractable for very large models with millions of parameters.
  • Requires access to the complete training dataset and training process, making it impossible to apply to pre-trained models without access to original training data.
  • Accuracy degrades for highly non-convex models where the linear approximation underlying influence functions breaks down.
  • Results can be sensitive to hyperparameter choices and may not generalise well across different model architectures or training procedures.

Resources

Understanding Black-box Predictions via Influence Functions
Research PaperPang Wei Koh and Percy Liang
nimarb/pytorch_influence_functions
Software Package
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Research PaperSang Keun Choe et al.
Scaling Up Influence Functions
Research PaperAndrea Schioppa et al.
Welcome to torch-influence's API Reference! — torch-influence 0.1.0 ...
Documentation

Tags

Applicable Models:
Data Requirements:
Data Type:
Evidence Type:
Expertise Needed:
Explanatory Scope:
Lifecycle Stage:
Technique Type: