[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining
Source: arXiv - 2602.22143v1
Overview
The paper introduces MedTri, a ready‑to‑use framework that converts free‑form radiology reports into a clean, structured format – a triplet of [Anatomical Entity : Radiologic Description + Diagnosis Category]. By stripping away stylistic quirks and irrelevant text, MedTri supplies vision‑language models with consistent, image‑grounded supervision, leading to noticeably better pre‑training performance across X‑ray and CT datasets.
Key Contributions
- Unified triplet representation – normalizes diverse medical reports into a single, anatomy‑centric schema that preserves morphology and spatial cues.
- Open‑source MedTri platform – end‑to‑end pipeline (parsing → entity linking → triplet generation) that can be dropped into existing vision‑language pre‑training workflows.
- Empirical validation – systematic experiments showing structured triplets outperform raw reports and prior normalization baselines on multiple downstream tasks (e.g., disease classification, report generation).
- Modular augmentation hooks – demonstrates how the triplet format enables plug‑in text‑level augmentations such as knowledge enrichment (adding ontology facts) and anatomy‑grounded counterfactuals, boosting robustness without changing the core normalizer.
- Cross‑modality applicability – evaluated on both chest X‑ray and abdominal CT corpora, proving the approach generalizes across imaging modalities.
Methodology
-
Report Parsing – a lightweight NLP front‑end (sentence segmentation + part‑of‑speech tagging) isolates candidate anatomical mentions.
-
Entity Linking – uses a pre‑trained medical ontology (e.g., RadLex, SNOMED CT) to map each mention to a canonical anatomical entity (e.g., “right lower lobe”).
-
Description Extraction – a rule‑based + transformer‑based classifier extracts the radiologic description (e.g., “consolidation”, “ground‑glass opacity”) that directly pertains to the linked anatomy.
-
Diagnosis Categorization – a fine‑tuned BERT model predicts a high‑level diagnosis label (e.g., “pneumonia”, “fracture”) from the remaining report context.
-
Triplet Assembly – the three components are concatenated into the final normalized string:
[Right Lower Lobe: Consolidation + Pneumonia] -
Integration with Vision‑Language Pre‑training – the triplets replace raw reports as textual inputs for contrastive or generative pre‑training objectives (e.g., CLIP‑style image‑text alignment).
The pipeline is deliberately modular: each stage can be swapped for a more sophisticated model, but the default configuration works out‑of‑the‑box for most research and production settings.
Results & Findings
| Dataset (Modality) | Baseline (raw reports) | Prior Normalization | MedTri Triplet | Δ over Baseline |
|---|---|---|---|---|
| ChestX‑Ray14 | 71.2 % AUC (disease cls) | 73.0 % | 75.6 % | +4.4 % |
| MIMIC‑CT | 68.5 % AUC (lesion det.) | 70.1 % | 73.3 % | +4.8 % |
| Report Generation (BLEU) | 12.4 | 13.7 | 15.9 | +3.5 |
- Consistent gains across both classification and report‑generation tasks, confirming that anatomy‑grounded normalization supplies higher‑quality supervision.
- Ablation studies show that removing either the anatomical entity or the diagnosis category degrades performance, highlighting the importance of the full triplet.
- Augmentation experiments (knowledge enrichment + counterfactual anatomy swaps) add an extra 1–2 % improvement on top of the MedTri baseline, demonstrating the extensibility of the format.
Practical Implications
- Faster model convergence – cleaner, uniform text reduces the noise the vision‑language model must learn to ignore, cutting pre‑training epochs and compute costs.
- Better downstream transfer – models pre‑trained with MedTri triplets adapt more readily to specialty tasks (e.g., rare disease detection) because the textual signal is tightly tied to anatomical regions.
- Plug‑and‑play for developers – the open‑source MedTri library can be integrated into existing pipelines (PyTorch, TensorFlow) with a single function call, no need to hand‑craft regexes or custom ontologies.
- Facilitates compliance & auditing – structured triplets are easier to map to regulatory vocabularies, aiding traceability and explainability in clinical AI products.
- Enables advanced data augmentation – developers can programmatically generate counterfactual reports (e.g., “Left lung: Clear + No pneumonia”) to stress‑test models for robustness against label noise or bias.
Limitations & Future Work
- Ontology dependence – the current entity linker relies on a fixed set of anatomical terms; extending to less‑common anatomies or emerging modalities may require additional curation.
- Rule‑heavy description extraction – while effective, the rule‑based component can miss nuanced phrasing; future work could replace it with end‑to‑end neural parsers trained on larger annotated corpora.
- Scalability to multi‑modal reports – the study focused on single‑image reports; handling multi‑image series (e.g., full CT scans) will need richer spatial linking.
- Clinical validation – the paper reports benchmark improvements, but real‑world deployment studies (e.g., radiologist workflow integration) are still pending.
The authors plan to broaden MedTri’s ontology coverage, explore hierarchical triplet structures (organ → sub‑structure), and open a benchmark hub for community‑driven evaluation.
Authors
- Yuetan Chu
- Xinhua Ma
- Xinran Jin
- Gongning Luo
- Xin Gao
Paper Information
- arXiv ID: 2602.22143v1
- Categories: cs.CV
- Published: February 25, 2026
- PDF: Download PDF