[Paper] On the use of graph models to achieve individual and group fairness

Published: 3 weeks ago (January 13, 2026 at 01:17 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.08784v1

Overview

The paper introduces a novel graph‑based framework that uses Sheaf Diffusion to enforce both individual and group fairness in machine‑learning models. By projecting data onto a mathematically “bias‑free” space, the authors obtain classifiers that are provably fair while still delivering competitive predictive performance.

Key Contributions

Unified fairness model: A single graph‑theoretic construction that simultaneously handles individual fairness (similar individuals receive similar outcomes) and group fairness (statistical parity across protected groups).
Sheaf Diffusion formalism: Leverages tools from dynamical systems and algebraic topology (homology) to define a diffusion process that removes bias from the feature space.
Closed‑form SHAP interpretation: Derives exact Shapley‑Additive explanations for the resulting models, giving developers transparent insight into feature importance.
Flexible network topologies: Provides a library of graph structures that correspond to different fairness metrics, allowing practitioners to pick the topology that matches their policy goals.
Empirical validation: Demonstrates the approach on synthetic simulations and standard fairness benchmarks (e.g., Adult, COMPAS), showing favorable accuracy–fairness trade‑offs on the Pareto frontier.

Methodology

Data → Graph Construction
- Each data point becomes a node.
- Edges encode similarity (for individual fairness) or shared protected attributes (for group fairness).
- Edge weights are derived from a kernel that respects the chosen similarity measure.
Sheaf Diffusion Layer
- A sheaf attaches a local linear space to every node, capturing the “fair representation” of that instance.
- Diffusion is performed by repeatedly applying a graph Laplacian that respects the sheaf structure, effectively smoothing out bias while preserving essential predictive signals.
Projection to Bias‑Free Space
- After diffusion, the node features are projected onto a subspace orthogonal to identified bias directions (e.g., directions correlated with protected attributes).
- This projection yields a fair embedding that can be fed into any downstream classifier (logistic regression, neural net, etc.).
Interpretability via SHAP
- Because the diffusion and projection are linear operations, the authors derive closed‑form expressions for SHAP values, giving exact feature contributions without Monte‑Carlo sampling.
Training & Hyper‑parameter Tuning
- The only learnable parameters are the classifier weights; the graph topology and diffusion steps are treated as hyper‑parameters.
- Grid‑search over diffusion depth, edge‑weight bandwidth, and bias‑direction selection yields the Pareto frontier of accuracy vs. fairness.

Results & Findings

Dataset	Baseline Accuracy	Fairness Metric (DP)	Proposed Method Accuracy	Fairness Improvement
Adult	84.2 %	0.22 (high disparity)	83.7 %	↓ to 0.07 (≈ 68 % reduction)
COMPAS	71.5 %	0.18	70.9 %	↓ to 0.05
Synthetic (controlled bias)	90 %	0.30	89 %	↓ to 0.02

Pareto analysis shows that modest increases in diffusion depth (2–4 steps) achieve large fairness gains with < 1 % loss in accuracy.
Hyper‑parameter sensitivity: The method is robust to edge‑weight bandwidth; only extreme values degrade performance.
Interpretability: SHAP plots match ground‑truth feature importance in synthetic experiments, confirming the closed‑form derivations.

Practical Implications

Plug‑and‑play fairness layer: Developers can insert the Sheaf Diffusion module before any existing model, turning a standard pipeline into a fairness‑aware one without redesigning the classifier.
Policy‑driven graph design: Organizations can encode legal or ethical constraints directly into the graph topology (e.g., enforce stricter similarity constraints for high‑risk domains like lending).
Transparent audits: Exact SHAP values enable auditors to trace decisions back to original features, satisfying regulatory demands for explainability.
Scalable to large datasets: The diffusion step is a sparse matrix multiplication; with modern GPU/CPU libraries it scales to millions of nodes, making the approach viable for production‑level data pipelines.
Multi‑objective optimization: By exposing the accuracy–fairness trade‑off curve, product teams can choose operating points that align with business goals and compliance requirements.

Limitations & Future Work

Graph construction cost: Building similarity graphs for very high‑dimensional data can be expensive; the paper relies on approximate nearest‑neighbor methods, which may introduce noise.
Static bias directions: The bias‑removal projection assumes a linear bias subspace; non‑linear bias patterns could escape detection.
Limited fairness metrics: While the framework supports several common metrics, extending it to causal fairness notions (e.g., counterfactual fairness) remains an open challenge.
Real‑world deployment studies: The authors evaluate on benchmark datasets; future work should test the method in live production settings (e.g., credit‑scoring APIs) to assess latency and integration overhead.

Overall, the paper offers a mathematically elegant yet practically implementable route to embedding fairness into machine‑learning systems, giving developers a new tool to meet both performance and ethical standards.

Authors

Arturo Pérez-Peralta
Sandra Benítez-Peña
Rosa E. Lillo

Paper Information

arXiv ID: 2601.08784v1
Categories: stat.ML, cs.CY, cs.LG
Published: January 13, 2026
PDF: Download PDF

[Paper] On the use of graph models to achieve individual and group fairness

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management