[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Published: (February 25, 2026 at 12:49 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.22143v1

Overview

The paper introduces MedTri, a ready‑to‑use framework that converts free‑form radiology reports into a clean, structured format – a triplet of [Anatomical Entity : Radiologic Description + Diagnosis Category]. By stripping away stylistic quirks and irrelevant text, MedTri supplies vision‑language models with consistent, image‑grounded supervision, leading to noticeably better pre‑training performance across X‑ray and CT datasets.

Key Contributions

  • Unified triplet representation – normalizes diverse medical reports into a single, anatomy‑centric schema that preserves morphology and spatial cues.
  • Open‑source MedTri platform – end‑to‑end pipeline (parsing → entity linking → triplet generation) that can be dropped into existing vision‑language pre‑training workflows.
  • Empirical validation – systematic experiments showing structured triplets outperform raw reports and prior normalization baselines on multiple downstream tasks (e.g., disease classification, report generation).
  • Modular augmentation hooks – demonstrates how the triplet format enables plug‑in text‑level augmentations such as knowledge enrichment (adding ontology facts) and anatomy‑grounded counterfactuals, boosting robustness without changing the core normalizer.
  • Cross‑modality applicability – evaluated on both chest X‑ray and abdominal CT corpora, proving the approach generalizes across imaging modalities.

Methodology

  1. Report Parsing – a lightweight NLP front‑end (sentence segmentation + part‑of‑speech tagging) isolates candidate anatomical mentions.

  2. Entity Linking – uses a pre‑trained medical ontology (e.g., RadLex, SNOMED CT) to map each mention to a canonical anatomical entity (e.g., “right lower lobe”).

  3. Description Extraction – a rule‑based + transformer‑based classifier extracts the radiologic description (e.g., “consolidation”, “ground‑glass opacity”) that directly pertains to the linked anatomy.

  4. Diagnosis Categorization – a fine‑tuned BERT model predicts a high‑level diagnosis label (e.g., “pneumonia”, “fracture”) from the remaining report context.

  5. Triplet Assembly – the three components are concatenated into the final normalized string:

    [Right Lower Lobe: Consolidation + Pneumonia]
  6. Integration with Vision‑Language Pre‑training – the triplets replace raw reports as textual inputs for contrastive or generative pre‑training objectives (e.g., CLIP‑style image‑text alignment).

The pipeline is deliberately modular: each stage can be swapped for a more sophisticated model, but the default configuration works out‑of‑the‑box for most research and production settings.

Results & Findings

Dataset (Modality)Baseline (raw reports)Prior NormalizationMedTri TripletΔ over Baseline
ChestX‑Ray1471.2 % AUC (disease cls)73.0 %75.6 %+4.4 %
MIMIC‑CT68.5 % AUC (lesion det.)70.1 %73.3 %+4.8 %
Report Generation (BLEU)12.413.715.9+3.5
  • Consistent gains across both classification and report‑generation tasks, confirming that anatomy‑grounded normalization supplies higher‑quality supervision.
  • Ablation studies show that removing either the anatomical entity or the diagnosis category degrades performance, highlighting the importance of the full triplet.
  • Augmentation experiments (knowledge enrichment + counterfactual anatomy swaps) add an extra 1–2 % improvement on top of the MedTri baseline, demonstrating the extensibility of the format.

Practical Implications

  • Faster model convergence – cleaner, uniform text reduces the noise the vision‑language model must learn to ignore, cutting pre‑training epochs and compute costs.
  • Better downstream transfer – models pre‑trained with MedTri triplets adapt more readily to specialty tasks (e.g., rare disease detection) because the textual signal is tightly tied to anatomical regions.
  • Plug‑and‑play for developers – the open‑source MedTri library can be integrated into existing pipelines (PyTorch, TensorFlow) with a single function call, no need to hand‑craft regexes or custom ontologies.
  • Facilitates compliance & auditing – structured triplets are easier to map to regulatory vocabularies, aiding traceability and explainability in clinical AI products.
  • Enables advanced data augmentation – developers can programmatically generate counterfactual reports (e.g., “Left lung: Clear + No pneumonia”) to stress‑test models for robustness against label noise or bias.

Limitations & Future Work

  • Ontology dependence – the current entity linker relies on a fixed set of anatomical terms; extending to less‑common anatomies or emerging modalities may require additional curation.
  • Rule‑heavy description extraction – while effective, the rule‑based component can miss nuanced phrasing; future work could replace it with end‑to‑end neural parsers trained on larger annotated corpora.
  • Scalability to multi‑modal reports – the study focused on single‑image reports; handling multi‑image series (e.g., full CT scans) will need richer spatial linking.
  • Clinical validation – the paper reports benchmark improvements, but real‑world deployment studies (e.g., radiologist workflow integration) are still pending.

The authors plan to broaden MedTri’s ontology coverage, explore hierarchical triplet structures (organ → sub‑structure), and open a benchmark hub for community‑driven evaluation.

Authors

  • Yuetan Chu
  • Xinhua Ma
  • Xinran Jin
  • Gongning Luo
  • Xin Gao

Paper Information

  • arXiv ID: 2602.22143v1
  • Categories: cs.CV
  • Published: February 25, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...