[Paper] In-context Inverse Optimality for Fair Digital Twins: A Preference-based approach
Source: arXiv - 2512.01650v1
Overview
The paper tackles a growing tension in the world of Digital Twins (DTs): while these virtual replicas can compute mathematically optimal actions, those actions often clash with what humans consider “fair.” By treating fairness as a learnable objective, the authors present a preference‑based framework that lets DTs infer what fairness means to people and then embed that notion directly into their optimization routines.
Key Contributions
- Preference‑driven fairness learning: Introduces a pipeline that extracts latent fairness objectives from pairwise human preferences over feasible decisions.
- Context‑aware Siamese network: Proposes a novel Siamese neural architecture that, given contextual features (e.g., hospital load, regional demographics), outputs convex quadratic cost functions representing the inferred fairness objective.
- Convex surrogate integration: Shows how the learned quadratic surrogate can be plugged into existing optimization models without sacrificing tractability or speed.
- Real‑world validation: Demonstrates the approach on a COVID‑19 hospital resource allocation case study, highlighting alignment between algorithmic recommendations and stakeholder notions of fairness.
- Generalizable framework: Provides a blueprint for embedding human‑centered fairness into any optimization‑based DT, not just healthcare scenarios.
Methodology
-
Data Collection – Pairwise Preferences:
- Decision makers (e.g., hospital administrators) are presented with two feasible allocation plans.
- They indicate which plan feels “fairer.” This yields a dataset of preference pairs ((\mathbf{x}_i, \mathbf{x}_j)) with a binary label.
-
Siamese Neural Network Design:
- Two identical subnetworks process each plan together with its context vector (e.g., current ICU occupancy, regional infection rates).
- The network outputs a parameter vector (\mathbf{w}) that defines a convex quadratic cost (f_{\mathbf{w}}(\mathbf{x}) = \mathbf{x}^\top \mathbf{Q}{\mathbf{w}} \mathbf{x} + \mathbf{c}{\mathbf{w}}^\top \mathbf{x}).
- Training minimizes a pairwise ranking loss (e.g., hinge loss) so that the network assigns lower cost to the plan preferred by the human.
-
Surrogate Objective Integration:
- The learned quadratic cost replaces or augments the original objective in the DT’s optimization problem:
[ \min_{\mathbf{x}\in\mathcal{X}} ; \underbrace{g(\mathbf{x})}{\text{original goal}} + \lambda , f{\mathbf{w}}(\mathbf{x}) ] - Because the surrogate is convex quadratic, standard solvers (QP, interior‑point) solve the problem efficiently.
- The learned quadratic cost replaces or augments the original objective in the DT’s optimization problem:
-
Iterative Refinement (Optional):
- After deployment, new preference data can be collected to fine‑tune the network, enabling the DT to adapt to evolving fairness expectations.
Results & Findings
- Alignment Metric: In the COVID‑19 allocation experiment, the DT’s recommendations matched human‑chosen “fair” plans in ≈87 % of test cases, a substantial jump from the baseline (≈55 %).
- Computational Overhead: Adding the learned quadratic term increased solve time by < 5 % on typical mixed‑integer linear programming (MILP) formulations, confirming the method’s practicality.
- Robustness to Context Shifts: When simulated pandemic waves altered demand patterns, the context‑aware network automatically adjusted the quadratic coefficients, preserving fairness alignment without retraining from scratch.
- Interpretability: The learned (\mathbf{Q}_{\mathbf{w}}) matrices revealed that the model penalized allocations that disproportionately favored already‑well‑served hospitals, echoing a “equity‑of‑outcome” intuition expressed by participants.
Practical Implications
- Human‑Centric DT Deployment: Engineers can now embed a learned fairness layer into any DT that already solves an optimization problem, ensuring outputs respect stakeholder values without hand‑crafting complex fairness constraints.
- Rapid Prototyping: The preference‑based data collection is lightweight (simple pairwise comparisons) and can be run as a short survey with domain experts, dramatically shortening the time from concept to fair‑aware system.
- Regulatory Compliance: In sectors where fairness is mandated (healthcare, finance, transportation), the framework offers a defensible, data‑driven way to demonstrate that algorithmic decisions align with human‑defined fairness criteria.
- Scalable to Edge Devices: Because the surrogate is a quadratic form, the final optimization can run on modest hardware (e.g., hospital servers, edge gateways), making it suitable for real‑time DT applications.
- Continuous Learning Loop: Organizations can set up a feedback portal where operators flag “unfair” decisions, feeding new preference pairs back into the model and keeping the DT in sync with evolving norms.
Limitations & Future Work
- Preference Quality: The approach assumes that pairwise preferences are consistent and reflect a coherent fairness notion; noisy or contradictory feedback can degrade the learned surrogate.
- Expressiveness of Quadratics: While convex quadratics are computationally convenient, they may not capture highly non‑linear fairness concepts (e.g., threshold effects). Extending to richer function families is an open direction.
- Scalability of Data Collection: For very high‑dimensional decision spaces, the number of required preference queries may grow; active‑learning strategies could reduce this burden.
- Cross‑Domain Transfer: The current study focuses on a single healthcare scenario; future work will test transferability of learned fairness representations across domains (e.g., energy grid management, autonomous logistics).
Authors
- Daniele Masti
- Francesco Basciani
- Arianna Fedeli
- Girgio Gnecco
- Francesco Smarra
Paper Information
- arXiv ID: 2512.01650v1
- Categories: cs.LG, cs.SE, math.OC
- Published: December 1, 2025
- PDF: Download PDF