Reweighing
Description
Reweighing is a pre-processing technique that mitigates bias by assigning different weights to training examples based on their group membership and class label. The weights are calculated to ensure that privileged and unprivileged groups have equal influence on the model's training process, effectively balancing the dataset without altering the feature values themselves. This helps to train fairer models by correcting for historical imbalances in how different groups are represented in the data.
Example Use Cases
Fairness
In a loan application system, if historical data shows that a higher proportion of applicants from a minority group were denied loans (negative outcome), reweighing would assign higher weights to these instances. This forces the model to pay more attention to correctly classifying the underrepresented group, aiming to correct for historical bias and improve fairness metrics like equal opportunity.
When developing a hiring model, if the training data contains fewer female applicants for senior roles, reweighing can be applied to increase the importance of these instances. This helps to prevent the model from learning a spurious correlation between gender and seniority, ensuring that female candidates are evaluated more equitably during the screening process.
Transparency
In a medical diagnosis system, reweighing provides transparency by explicitly showing which demographic groups required adjustment for balanced representation. The computed weights serve as documentation of historical bias patterns in medical data, helping clinicians understand potential disparities and ensuring the model's decisions are based on medical evidence rather than demographic correlations.
Reliability
For a credit scoring model deployed across different regions, reweighing improves reliability by ensuring consistent performance across demographic groups. By balancing the training data representation, the model maintains stable accuracy metrics across different population segments, reducing the risk of performance degradation when deployed in areas with different demographic compositions.
Limitations
- The technique only adjusts the overall influence of demographic groups and does not address biases that may be encoded within the features themselves.
- Assigning very high weights to a small number of instances from an underrepresented group can increase the model's variance and make it sensitive to outliers, potentially harming generalisation.
- The effectiveness of reweighing depends on the assumption that the labels in the training data are accurate; it cannot correct for label bias where outcomes were themselves the result of historical discrimination.
- It may not be effective if the feature distributions for different groups are fundamentally different, as it cannot change the underlying data relationships.
Resources
Achieving Fairness at No Utility Cost via Data Reweighing with Influence
Presents a novel reweighing approach that computes individual sample weights based on influence functions to achieve fairness without sacrificing model utility
aif360.sklearn.preprocessing.Reweighing — aif360 0.6.1 ...
Documentation for scikit-learn compatible implementation of the reweighing preprocessing technique in the AI Fairness 360 library