Sensitivity Analysis for Fairness

Description

Sensitivity Analysis for Fairness systematically evaluates how model predictions change when sensitive attributes or their proxies are perturbed whilst holding other factors constant. The technique involves creating counterfactual instances by modifying potentially discriminatory features (race, gender, age) or their correlates (zip code, names, education institutions) and measuring the resulting prediction differences. This controlled perturbation approach quantifies the degree to which protected characteristics influence model decisions, helping detect both direct discrimination and indirect bias through proxy variables even when sensitive attributes are not explicitly used as model inputs.

Example Use Cases

Fairness

Testing whether a lending model's decisions change significantly when only the applicant's zip code (which may correlate with race) is altered, while keeping all other factors constant.

Evaluating a recruitment algorithm by systematically changing candidate names from stereotypically male to female names (whilst keeping qualifications identical) to measure whether gender bias affects hiring recommendations, revealing discrimination through name-based proxies.

Assessing a healthcare resource allocation model by varying patient zip codes across different socioeconomic areas to determine whether geographic proxies for race and income inappropriately influence treatment recommendations.

Limitations

Requires domain expertise to identify relevant proxy variables for sensitive attributes, which may not be obvious or comprehensive.
Computationally intensive for complex models when testing many feature combinations or perturbation ranges.
Choice of perturbation ranges and comparison points involves subjective decisions that can significantly affect results and conclusions.
May miss subtle or interaction-based forms of discrimination that only manifest under specific combinations of features.

Resources

The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning

Research Paper•Jake Fawkes et al.•Oct 12, 2024

Fair SA: Sensitivity Analysis for Fairness in Face Recognition

Research Paper•Aparna R. Joshi et al.•Feb 8, 2022

fairlearn/fairlearn

Software Package

Aequitas: Bias Audit Toolkit

Software Package

Fairness Through Sensitivity Analysis - Towards Data Science

Tutorial

User Guide - Fairlearn documentation

Documentation

Related Techniques

Name	Description	Assurance Goals
SHapley Additive exPlanations	SHAP explains model predictions by quantifying how much each input feature contributes to the outcome. It assigns an importance score to every feature, indicating whether it pushes the prediction towards or away from the average. The method systematically evaluates how predictions change as features are included or excluded, drawing on game theory concepts to ensure a fair distribution of contributions.	Explainability Fairness Reliability
Equalised Odds Post-Processing	A post-processing fairness technique based on Hardt et al.'s seminal work that adjusts classification thresholds after model training to achieve equal true positive rates and false positive rates across demographic groups. The method uses group-specific decision thresholds, potentially with randomisation, to satisfy the equalised odds constraint whilst preserving model utility. This approach enables fairness mitigation without retraining, making it applicable to existing deployed models or when training data access is restricted.	Fairness Transparency Reliability
Feature Attribution with Integrated Gradients in NLP	Applies Integrated Gradients to natural language processing models to attribute prediction importance to individual input tokens, words, or subword units. This technique computes gradients along a straight-line path from a baseline input (typically all-zeros, padding tokens, or neutral text) to the actual input, integrating these gradients to obtain attribution scores. Unlike vanilla gradient methods, Integrated Gradients satisfies axioms of sensitivity and implementation invariance, making it particularly valuable for understanding transformer-based language models where token interactions are complex.	Explainability Fairness Safety
Disparate Impact Remover	Disparate Impact Remover is a preprocessing technique that transforms feature values in a dataset to reduce statistical dependence between features and protected attributes (like race or gender). The method modifies non-protected features through mathematical transformations that preserve the utility of the data whilst reducing correlations that could lead to discriminatory outcomes. This approach specifically targets the '80% rule' disparate impact threshold by adjusting feature distributions to ensure more equitable treatment across demographic groups in downstream model predictions.	Fairness Transparency Reliability
Red Teaming	Red teaming involves systematic adversarial testing of AI/ML systems by dedicated specialists who attempt to identify flaws, vulnerabilities, harmful outputs, and ways to circumvent safety measures. Drawing from cybersecurity practices, red teams employ diverse attack vectors including prompt injection, adversarial examples, edge case exploitation, social engineering scenarios, and goal misalignment probes. Unlike standard testing that validates expected behaviour, red teaming specifically seeks to break systems through creative and adversarial approaches, revealing non-obvious risks and failure modes that could be exploited maliciously or cause harm in deployment.	Safety Reliability Fairness Security
Calibration with Equality of Opportunity	A post-processing fairness technique that adjusts model predictions to achieve equal true positive rates across protected groups whilst maintaining calibration within each group. The method addresses fairness by ensuring that qualified individuals from different demographic groups have equal chances of receiving positive predictions, whilst preserving the meaning of probability scores within each group. This technique attempts to balance the competing objectives of group fairness and accurate probability estimation.	Fairness Transparency Reliability

Tags

Applicable Models:

Data Requirements:

Sensitive Attributes

Data Type:

Evidence Type:

Quantitative Metric

Expertise Needed:

Explanatory Scope:

Fairness Approach:

Lifecycle Stage:

Model Development

Technique Type: