Sensitivity Analysis for Fairness
Description
Sensitivity Analysis for Fairness systematically evaluates how model predictions change when sensitive attributes or their proxies are perturbed whilst holding other factors constant. The technique involves creating counterfactual instances by modifying potentially discriminatory features (race, gender, age) or their correlates (zip code, names, education institutions) and measuring the resulting prediction differences. This controlled perturbation approach quantifies the degree to which protected characteristics influence model decisions, helping detect both direct discrimination and indirect bias through proxy variables even when sensitive attributes are not explicitly used as model inputs.
Example Use Cases
Fairness
Testing whether a lending model's decisions change significantly when only the applicant's zip code (which may correlate with race) is altered, while keeping all other factors constant.
Evaluating a recruitment algorithm by systematically changing candidate names from stereotypically male to female names (whilst keeping qualifications identical) to measure whether gender bias affects hiring recommendations, revealing discrimination through name-based proxies.
Assessing a healthcare resource allocation model by varying patient zip codes across different socioeconomic areas to determine whether geographic proxies for race and income inappropriately influence treatment recommendations.
Limitations
- Requires domain expertise to identify relevant proxy variables for sensitive attributes, which may not be obvious or comprehensive.
- Computationally intensive for complex models when testing many feature combinations or perturbation ranges.
- Choice of perturbation ranges and comparison points involves subjective decisions that can significantly affect results and conclusions.
- May miss subtle or interaction-based forms of discrimination that only manifest under specific combinations of features.