[Paper] MPD$^2$-Router: Mask-aware Multi-expert Prior-regularized Dual-head Deferral Router in Glaucoma Screening and Diagnosis
Source: arXiv - 2605.08024v1
Overview
The paper presents MPD²‑Router, a novel “learning‑to‑defer” system that decides, for each retinal image, whether an AI model should make the glaucoma diagnosis or hand the case off to a human specialist – and if so, which specialist (e.g., general ophthalmologist, glaucoma expert). By explicitly modeling expert availability, workload balance, and the asymmetric costs of false positives vs. false negatives, the router makes triage safer and more efficient in real‑world screening pipelines.
Key Contributions
- Mask‑aware multi‑expert routing: Introduces a gating mechanism that respects per‑sample expert availability (the “mask”) while still learning optimal deferral decisions.
- Dual‑head architecture: One head predicts the clinical label, the other predicts who (which expert) should receive the case if deferral is chosen.
- Cost‑sensitive training objective: Combines asymmetric clinical costs, a deferral‑budget constraint (via augmented Lagrangian), and a group‑specific prior that reflects realistic expert skill distributions.
- Rank‑majorization JS regularizer: Prevents the model from collapsing onto a single expert (expert collapse) without forcing a perfectly uniform load, yielding a balanced but data‑driven allocation.
- Robust cross‑domain evaluation: Demonstrates consistent gains on three geographically diverse glaucoma datasets (REFUGE, CHAKSU, ORIGA) using a frozen backbone trained only on REFUGE, highlighting resilience to domain shift.
Methodology
- Backbone feature extractor: A standard convolutional network (e.g., ResNet) pretrained on the REFUGE glaucoma dataset extracts image embeddings; this part remains frozen during routing experiments.
- Dual‑head router:
- Classification head predicts the glaucoma label (healthy vs. diseased).
- Allocation head outputs a probability vector over the set of available experts.
- Mask‑aware Gumbel‑Sigmoid gating: For each image, a binary mask indicates which experts are on‑call. The gating layer applies a Gumbel‑Sigmoid trick to sample a hard deferral decision while still allowing gradient‑based learning; the mask forces the sampled expert to be one of the available ones.
- Signal fusion: The router ingests multiple cues—model uncertainty (e.g., entropy), morphological features (optic‑disc shape), image quality scores, and out‑of‑distribution detectors—to better gauge case difficulty.
- Training loss:
- Asymmetric cost term penalizes false negatives more heavily than false positives (reflecting clinical harm).
- Deferral budget term (augmented Lagrangian) enforces a target overall deferral rate (e.g., 20 %).
- Group prior term encourages the allocation distribution to match expected expert expertise levels.
- Rank‑majorization Jensen‑Shannon regularizer spreads the load across experts while allowing the model to favor higher‑skill clinicians for the hardest cases.
Results & Findings
| Dataset | AI‑only MCC | MPD²‑Router MCC | Clinical cost ↓ | Deferral rate |
|---|---|---|---|---|
| REFUGE | 0.71 | 0.78 | 22 % | 18 % |
| CHAKSU | 0.68 | 0.75 | 19 % | 20 % |
| ORIGA | 0.66 | 0.73 | 21 % | 19 % |
- Higher MCC (Matthews Correlation Coefficient) across all cohorts, indicating better overall diagnostic quality.
- Reduced clinical cost (a weighted sum of false‑negative and false‑positive harms) by roughly one‑fifth compared with a pure AI system.
- Pareto‑optimal trade‑offs: When plotting F1, MCC, and cost, MPD²‑Router dominates the AI‑only baseline, meaning you cannot improve one metric without hurting another.
- Balanced expert utilization: No single expert receives >35 % of deferred cases; workload is spread according to expertise and availability.
- Robustness to domain shift: Even when the backbone is frozen (no fine‑tuning on CHAKSU/ORIGA), the router still yields consistent gains, showing that the routing logic generalizes.
Practical Implications
- Safer screening pipelines: Clinics can deploy a high‑throughput AI detector for the bulk of cases while automatically routing ambiguous or high‑risk images to the right human, reducing missed glaucomatous eyes.
- Dynamic staffing: Because the mask respects on‑call schedules, the system can be used in tele‑ophthalmology networks where specialist availability varies by time zone.
- Cost‑effective scaling: By limiting deferrals to ~20 % of cases, hospitals can keep specialist time focused on the most valuable cases, potentially lowering per‑screening costs.
- Plug‑and‑play integration: The router sits on top of any existing glaucoma classifier; developers only need to supply expert availability masks and cost parameters.
- Regulatory friendliness: The explicit cost‑sensitive objective and transparent routing decisions align with emerging AI‑medical device guidelines that demand clear human‑in‑the‑loop safeguards.
Limitations & Future Work
- Static expert pool: The current formulation assumes a fixed set of experts; extending to a continuously changing roster (e.g., on‑demand crowdsourced graders) would require more dynamic masking.
- Reliance on handcrafted cues: While uncertainty and quality metrics improve performance, they are manually engineered; learning these signals end‑to‑end could further boost robustness.
- Evaluation on limited datasets: The study uses three public glaucoma cohorts; real‑world deployment in larger, more heterogeneous health systems remains to be validated.
- Deferral budget rigidity: The augmented‑Lagrangian enforces a hard budget; future work could explore adaptive budgets that respond to daily workload fluctuations.
Overall, MPD²‑Router offers a compelling blueprint for integrating AI and human expertise in ophthalmic triage, turning “learning‑to‑defer” from a theoretical curiosity into a practical, deployable safety net.
Authors
- Wenxin Zhan
Paper Information
- arXiv ID: 2605.08024v1
- Categories: cs.AI
- Published: May 8, 2026
- PDF: Download PDF