[Paper] RANGER: Sparsely-Gated Mixture-of-Experts with Adaptive Retrieval Re-ranking for Pathology Report Generation

Published: (March 4, 2026 at 01:12 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2603.04348v1

Overview

Pathology report generation—automatically turning gigapixel whole‑slide images (WSIs) into coherent diagnostic text—has lagged behind other medical AI tasks because of the sheer image size and the visual complexity of tissue samples. The new RANGER framework tackles this bottleneck by marrying a sparsely‑gated Mixture‑of‑Experts (MoE) decoder with an adaptive retrieval‑re‑ranking module, enabling the model to specialize its language generation while filtering out noisy external knowledge.

Key Contributions

  • Sparsely‑gated MoE decoder – introduces dynamic expert routing (top‑k gating + load‑balancing) so different “experts” focus on distinct diagnostic patterns (e.g., tumor morphology, stromal reaction).
  • Noisy top‑k routing – deliberately allows a small amount of “noise” in expert selection, encouraging robustness and better generalization across heterogeneous slides.
  • Adaptive retrieval re‑ranking – refines knowledge‑base snippets retrieved for a given slide by re‑scoring them with visual feature similarity, reducing irrelevant or contradictory guidance.
  • End‑to‑end training on PathText‑BRCA – demonstrates that the combined MoE + retrieval pipeline outperforms prior transformer‑only baselines on all major NLG metrics.
  • Scalable design – MoE gating keeps inference cost low (only a few experts activated per token) while still leveraging a large expert pool for specialization.

Methodology

  1. Feature Extraction – A CNN backbone (e.g., ResNet‑50) processes the WSI at a manageable resolution, producing a set of visual embeddings that capture tissue morphology.
  2. Knowledge Retrieval – A pre‑built textual knowledge base (e.g., prior pathology reports, medical ontologies) is queried with the visual embeddings, returning a ranked list of candidate sentences.
  3. Adaptive Re‑ranking – The retrieved candidates are re‑scored using a similarity network that aligns visual embeddings with textual embeddings, keeping only the most semantically aligned snippets.
  4. Mixture‑of‑Experts Decoder – The language model’s decoder is replaced by a sparsely‑gated MoE layer. For each generation step:
    • A lightweight router computes scores for all experts based on the current hidden state.
    • The top‑k experts (k ≈ 2–4) are activated; a small amount of stochastic noise is added to the scores to avoid over‑fitting to a single expert.
    • Load‑balancing regularization ensures all experts receive sufficient training data.
  5. Fusion – The refined retrieved text is concatenated with the visual context and fed into the MoE decoder, which produces the final report token‑by‑token.

All components are differentiable, allowing joint optimization of visual encoding, retrieval re‑ranking, and expert routing.

Results & Findings

MetricRANGERPrior State‑of‑the‑Art
BLEU‑10.45980.4211
BLEU‑20.30440.2678
BLEU‑30.20360.1765
BLEU‑40.14350.1192
METEOR0.18830.1620
ROUGE‑L0.30380.2741
  • Consistent gains across all n‑gram levels indicate better lexical coverage and fluency.
  • Ablation studies show that removing the MoE (using a vanilla transformer) drops BLEU‑4 by ~6 pts, while disabling adaptive re‑ranking reduces METEOR by ~4 pts, confirming each component’s contribution.
  • Load‑balancing loss keeps expert utilization around 80 % of the theoretical maximum, preventing “expert collapse.”

Practical Implications

  • Faster, more accurate report drafting – Pathology labs can integrate RANGER into their slide‑review pipelines to auto‑generate first‑draft reports, letting pathologists focus on verification rather than transcription.
  • Domain‑specific language models – The MoE design can be repurposed for other medical report generation tasks (radiology, dermatology) where diverse visual patterns demand specialized linguistic sub‑models.
  • Reduced reliance on noisy external data – Adaptive re‑ranking ensures that only the most relevant knowledge snippets influence the output, mitigating the risk of hallucinations—a common concern in clinical AI.
  • Scalable deployment – Because only a handful of experts are active per token, inference remains comparable to a standard transformer, making it feasible to run on on‑premise GPU clusters typical in hospital IT environments.
  • Potential for continual learning – New diagnostic categories can be added as fresh experts without retraining the entire model, supporting evolving clinical guidelines.

Limitations & Future Work

  • Memory footprint – Storing a large expert pool and a sizeable textual knowledge base still demands considerable GPU RAM; compression techniques were not explored.
  • Generalization beyond BRCA – Experiments were limited to the PathText‑BRCA dataset; performance on other cancer types or multi‑organ datasets remains to be validated.
  • Interpretability of expert routing – While the router selects experts dynamically, the paper does not provide a systematic way to map each expert to a clinically meaningful sub‑task. Future work could incorporate expert‑level attribution to aid trust.
  • Real‑time constraints – The current pipeline processes WSIs in batches; optimizing for single‑slide, low‑latency inference would be necessary for point‑of‑care applications.

RANGER demonstrates that blending sparsely‑gated Mixture‑of‑Experts with smart knowledge retrieval can push pathology report generation toward practical, clinic‑ready performance. For developers interested in building AI‑assisted diagnostic tools, the paper offers a concrete blueprint for combining visual‑language models with modular, specialist components.

Authors

  • Yixin Chen
  • Ziyu Su
  • Hikmat Khan
  • Muhammad Khalid Khan Niazi

Paper Information

  • arXiv ID: 2603.04348v1
  • Categories: cs.CV, cs.AI
  • Published: March 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »