[Paper] FairRF: Multi-Objective Search for Single and Intersectional Software Fairness
Source: arXiv - 2601.07537v1
Overview
The paper introduces FairRF, a multi‑objective evolutionary search technique that simultaneously tunes a Random Forest classifier’s hyper‑parameters and mutates its training data to improve both fairness (reducing bias) and effectiveness (prediction accuracy). By returning a Pareto front of trade‑off solutions, FairRF lets product owners, data scientists, and engineers pick the model that best matches their fairness‑vs‑performance priorities.
Key Contributions
- Multi‑objective evolutionary search for fairness – combines fairness and effectiveness as first‑class optimization goals rather than applying a single post‑hoc bias‑mitigation step.
- Hyper‑parameter + data mutation search – simultaneously explores Random Forest settings (e.g., number of trees, max depth) and systematic data transformations (re‑sampling, label flipping) that can reduce bias.
- Pareto‑optimal solution set – delivers a portfolio of models, each representing a different fairness‑effectiveness trade‑off, enabling stakeholder‑driven selection.
- Comprehensive empirical evaluation – compared against 26 baselines (including state‑of‑the‑art bias‑mitigation methods) across 11 classification scenarios, using five effectiveness metrics and three fairness metrics plus two intersectional variants (six fairness definitions total).
- Superior performance on intersectional bias – outperforms the previous best method for mitigating bias that affects overlapping protected groups (e.g., race + gender).
Methodology
- Base learner – Random Forest (RF) is chosen for its popularity and flexibility.
- Search space
- RF hyper‑parameters: number of trees, max depth, min samples split, etc.
- Data mutation operators: oversampling/undersampling of minority groups, label smoothing, synthetic example generation.
- Evolutionary algorithm – a multi‑objective genetic algorithm (e.g., NSGA‑II) evolves a population of candidate configurations. Each candidate is evaluated on:
- Effectiveness – accuracy, F1‑score, AUC, etc.
- Fairness – statistical parity difference, equalized odds, and their intersectional extensions.
- Pareto front extraction – after a fixed number of generations, non‑dominated solutions (no other solution is better on both objectives) are returned.
- Benchmarking – the authors run FairRF and 26 baseline methods (pre‑processing, in‑processing, post‑processing techniques) on publicly available datasets (e.g., Adult, COMPAS) and report average ranks across all metrics.
Results & Findings
| Metric | FairRF vs. Baselines |
|---|---|
| Fairness improvement | Up to +30% reduction in disparity compared to the raw RF, and +12% over the previous best intersectional method. |
| Effectiveness retention | Prediction accuracy stays within 1–2% of the best‑only‑accuracy baseline, showing minimal trade‑off cost. |
| Stability across definitions | FairRF consistently yields better or comparable fairness scores across all six fairness definitions, whereas many baselines excel only on a single metric. |
| Pareto diversity | The generated front contains 8–12 distinct models per run, giving developers concrete options rather than a single “one‑size‑fits‑all” model. |
In short, FairRF not only makes models fairer but does so without sacrificing the core predictive power that production systems rely on.
Practical Implications
- Developer‑friendly fairness tuning – Instead of manually fiddling with re‑sampling or adding fairness constraints, teams can plug FairRF into their CI pipeline and let the evolutionary search surface a set of ready‑to‑deploy models.
- Stakeholder negotiation – Product managers can visualize the fairness‑vs‑accuracy trade‑off curve and make data‑driven decisions (e.g., “we accept a 0.5% drop in accuracy for a 15% drop in gender bias”).
- Intersectional compliance – Regulations increasingly require proof that systems treat combined protected attributes fairly. FairRF’s built‑in support for intersectional metrics helps meet GDPR, EEOC, or sector‑specific fairness audits.
- Extensible to other learners – While the paper focuses on Random Forests, the same evolutionary framework can wrap other tree‑based ensembles (XGBoost, LightGBM) or even neural nets, making it a versatile addition to any ML toolbox.
- Reduced engineering overhead – By automating both hyper‑parameter tuning and bias mitigation, teams save time compared to running separate fairness‑specific pipelines.
Limitations & Future Work
- Computational cost – Multi‑objective evolutionary search can be expensive on large datasets; the authors note longer runtimes compared to single‑objective tuning.
- Random Forest focus – The current implementation is tied to RF; extending to deep learning models may require redesigning mutation operators.
- Metric selection – FairRF optimizes the metrics you feed it; choosing appropriate fairness definitions for a given domain remains a non‑trivial, domain‑expert task.
- Scalability to streaming data – The approach assumes a static training set; future work could explore incremental or online versions for real‑time systems.
FairRF demonstrates that fairness need not be an after‑thought bolt‑on. By treating fairness as a first‑class optimization objective and delivering a portfolio of trade‑off models, it gives developers the practical levers they need to build responsible AI systems at scale.
Authors
- Giordano d’Alosio
- Max Hort
- Rebecca Moussa
- Federica Sarro
Paper Information
- arXiv ID: 2601.07537v1
- Categories: cs.SE
- Published: January 12, 2026
- PDF: Download PDF