[Paper] On the use of graph models to achieve individual and group fairness

Published: (January 13, 2026 at 01:17 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.08784v1

Overview

The paper introduces a novel graph‑based framework that uses Sheaf Diffusion to enforce both individual and group fairness in machine‑learning models. By projecting data onto a mathematically “bias‑free” space, the authors obtain classifiers that are provably fair while still delivering competitive predictive performance.

Key Contributions

  • Unified fairness model: A single graph‑theoretic construction that simultaneously handles individual fairness (similar individuals receive similar outcomes) and group fairness (statistical parity across protected groups).
  • Sheaf Diffusion formalism: Leverages tools from dynamical systems and algebraic topology (homology) to define a diffusion process that removes bias from the feature space.
  • Closed‑form SHAP interpretation: Derives exact Shapley‑Additive explanations for the resulting models, giving developers transparent insight into feature importance.
  • Flexible network topologies: Provides a library of graph structures that correspond to different fairness metrics, allowing practitioners to pick the topology that matches their policy goals.
  • Empirical validation: Demonstrates the approach on synthetic simulations and standard fairness benchmarks (e.g., Adult, COMPAS), showing favorable accuracy–fairness trade‑offs on the Pareto frontier.

Methodology

  1. Data → Graph Construction

    • Each data point becomes a node.
    • Edges encode similarity (for individual fairness) or shared protected attributes (for group fairness).
    • Edge weights are derived from a kernel that respects the chosen similarity measure.
  2. Sheaf Diffusion Layer

    • A sheaf attaches a local linear space to every node, capturing the “fair representation” of that instance.
    • Diffusion is performed by repeatedly applying a graph Laplacian that respects the sheaf structure, effectively smoothing out bias while preserving essential predictive signals.
  3. Projection to Bias‑Free Space

    • After diffusion, the node features are projected onto a subspace orthogonal to identified bias directions (e.g., directions correlated with protected attributes).
    • This projection yields a fair embedding that can be fed into any downstream classifier (logistic regression, neural net, etc.).
  4. Interpretability via SHAP

    • Because the diffusion and projection are linear operations, the authors derive closed‑form expressions for SHAP values, giving exact feature contributions without Monte‑Carlo sampling.
  5. Training & Hyper‑parameter Tuning

    • The only learnable parameters are the classifier weights; the graph topology and diffusion steps are treated as hyper‑parameters.
    • Grid‑search over diffusion depth, edge‑weight bandwidth, and bias‑direction selection yields the Pareto frontier of accuracy vs. fairness.

Results & Findings

DatasetBaseline AccuracyFairness Metric (DP)Proposed Method AccuracyFairness Improvement
Adult84.2 %0.22 (high disparity)83.7 %↓ to 0.07 (≈ 68 % reduction)
COMPAS71.5 %0.1870.9 %↓ to 0.05
Synthetic (controlled bias)90 %0.3089 %↓ to 0.02
  • Pareto analysis shows that modest increases in diffusion depth (2–4 steps) achieve large fairness gains with < 1 % loss in accuracy.
  • Hyper‑parameter sensitivity: The method is robust to edge‑weight bandwidth; only extreme values degrade performance.
  • Interpretability: SHAP plots match ground‑truth feature importance in synthetic experiments, confirming the closed‑form derivations.

Practical Implications

  • Plug‑and‑play fairness layer: Developers can insert the Sheaf Diffusion module before any existing model, turning a standard pipeline into a fairness‑aware one without redesigning the classifier.
  • Policy‑driven graph design: Organizations can encode legal or ethical constraints directly into the graph topology (e.g., enforce stricter similarity constraints for high‑risk domains like lending).
  • Transparent audits: Exact SHAP values enable auditors to trace decisions back to original features, satisfying regulatory demands for explainability.
  • Scalable to large datasets: The diffusion step is a sparse matrix multiplication; with modern GPU/CPU libraries it scales to millions of nodes, making the approach viable for production‑level data pipelines.
  • Multi‑objective optimization: By exposing the accuracy–fairness trade‑off curve, product teams can choose operating points that align with business goals and compliance requirements.

Limitations & Future Work

  • Graph construction cost: Building similarity graphs for very high‑dimensional data can be expensive; the paper relies on approximate nearest‑neighbor methods, which may introduce noise.
  • Static bias directions: The bias‑removal projection assumes a linear bias subspace; non‑linear bias patterns could escape detection.
  • Limited fairness metrics: While the framework supports several common metrics, extending it to causal fairness notions (e.g., counterfactual fairness) remains an open challenge.
  • Real‑world deployment studies: The authors evaluate on benchmark datasets; future work should test the method in live production settings (e.g., credit‑scoring APIs) to assess latency and integration overhead.

Overall, the paper offers a mathematically elegant yet practically implementable route to embedding fairness into machine‑learning systems, giving developers a new tool to meet both performance and ethical standards.

Authors

  • Arturo Pérez-Peralta
  • Sandra Benítez-Peña
  • Rosa E. Lillo

Paper Information

  • arXiv ID: 2601.08784v1
  • Categories: stat.ML, cs.CY, cs.LG
  • Published: January 13, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »