[Paper] FairRF: Multi-Objective Search for Single and Intersectional Software Fairness

Published: (January 12, 2026 at 08:42 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.07537v1

Overview

The paper introduces FairRF, a multi‑objective evolutionary search technique that simultaneously tunes a Random Forest classifier’s hyper‑parameters and mutates its training data to improve both fairness (reducing bias) and effectiveness (prediction accuracy). By returning a Pareto front of trade‑off solutions, FairRF lets product owners, data scientists, and engineers pick the model that best matches their fairness‑vs‑performance priorities.

Key Contributions

  • Multi‑objective evolutionary search for fairness – combines fairness and effectiveness as first‑class optimization goals rather than applying a single post‑hoc bias‑mitigation step.
  • Hyper‑parameter + data mutation search – simultaneously explores Random Forest settings (e.g., number of trees, max depth) and systematic data transformations (re‑sampling, label flipping) that can reduce bias.
  • Pareto‑optimal solution set – delivers a portfolio of models, each representing a different fairness‑effectiveness trade‑off, enabling stakeholder‑driven selection.
  • Comprehensive empirical evaluation – compared against 26 baselines (including state‑of‑the‑art bias‑mitigation methods) across 11 classification scenarios, using five effectiveness metrics and three fairness metrics plus two intersectional variants (six fairness definitions total).
  • Superior performance on intersectional bias – outperforms the previous best method for mitigating bias that affects overlapping protected groups (e.g., race + gender).

Methodology

  1. Base learner – Random Forest (RF) is chosen for its popularity and flexibility.
  2. Search space
    • RF hyper‑parameters: number of trees, max depth, min samples split, etc.
    • Data mutation operators: oversampling/undersampling of minority groups, label smoothing, synthetic example generation.
  3. Evolutionary algorithm – a multi‑objective genetic algorithm (e.g., NSGA‑II) evolves a population of candidate configurations. Each candidate is evaluated on:
    • Effectiveness – accuracy, F1‑score, AUC, etc.
    • Fairness – statistical parity difference, equalized odds, and their intersectional extensions.
  4. Pareto front extraction – after a fixed number of generations, non‑dominated solutions (no other solution is better on both objectives) are returned.
  5. Benchmarking – the authors run FairRF and 26 baseline methods (pre‑processing, in‑processing, post‑processing techniques) on publicly available datasets (e.g., Adult, COMPAS) and report average ranks across all metrics.

Results & Findings

MetricFairRF vs. Baselines
Fairness improvementUp to +30% reduction in disparity compared to the raw RF, and +12% over the previous best intersectional method.
Effectiveness retentionPrediction accuracy stays within 1–2% of the best‑only‑accuracy baseline, showing minimal trade‑off cost.
Stability across definitionsFairRF consistently yields better or comparable fairness scores across all six fairness definitions, whereas many baselines excel only on a single metric.
Pareto diversityThe generated front contains 8–12 distinct models per run, giving developers concrete options rather than a single “one‑size‑fits‑all” model.

In short, FairRF not only makes models fairer but does so without sacrificing the core predictive power that production systems rely on.

Practical Implications

  • Developer‑friendly fairness tuning – Instead of manually fiddling with re‑sampling or adding fairness constraints, teams can plug FairRF into their CI pipeline and let the evolutionary search surface a set of ready‑to‑deploy models.
  • Stakeholder negotiation – Product managers can visualize the fairness‑vs‑accuracy trade‑off curve and make data‑driven decisions (e.g., “we accept a 0.5% drop in accuracy for a 15% drop in gender bias”).
  • Intersectional compliance – Regulations increasingly require proof that systems treat combined protected attributes fairly. FairRF’s built‑in support for intersectional metrics helps meet GDPR, EEOC, or sector‑specific fairness audits.
  • Extensible to other learners – While the paper focuses on Random Forests, the same evolutionary framework can wrap other tree‑based ensembles (XGBoost, LightGBM) or even neural nets, making it a versatile addition to any ML toolbox.
  • Reduced engineering overhead – By automating both hyper‑parameter tuning and bias mitigation, teams save time compared to running separate fairness‑specific pipelines.

Limitations & Future Work

  • Computational cost – Multi‑objective evolutionary search can be expensive on large datasets; the authors note longer runtimes compared to single‑objective tuning.
  • Random Forest focus – The current implementation is tied to RF; extending to deep learning models may require redesigning mutation operators.
  • Metric selection – FairRF optimizes the metrics you feed it; choosing appropriate fairness definitions for a given domain remains a non‑trivial, domain‑expert task.
  • Scalability to streaming data – The approach assumes a static training set; future work could explore incremental or online versions for real‑time systems.

FairRF demonstrates that fairness need not be an after‑thought bolt‑on. By treating fairness as a first‑class optimization objective and delivering a portfolio of trade‑off models, it gives developers the practical levers they need to build responsible AI systems at scale.

Authors

  • Giordano d’Alosio
  • Max Hort
  • Rebecca Moussa
  • Federica Sarro

Paper Information

  • arXiv ID: 2601.07537v1
  • Categories: cs.SE
  • Published: January 12, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »