[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Published: 3 days ago (February 25, 2026 at 12:49 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.22143v1

Overview

The paper introduces MedTri, a ready‑to‑use framework that converts free‑form radiology reports into a clean, structured format – a triplet of [Anatomical Entity : Radiologic Description + Diagnosis Category]. By stripping away stylistic quirks and irrelevant text, MedTri supplies vision‑language models with consistent, image‑grounded supervision, leading to noticeably better pre‑training performance across X‑ray and CT datasets.

Key Contributions

Unified triplet representation – normalizes diverse medical reports into a single, anatomy‑centric schema that preserves morphology and spatial cues.
Open‑source MedTri platform – end‑to‑end pipeline (parsing → entity linking → triplet generation) that can be dropped into existing vision‑language pre‑training workflows.
Empirical validation – systematic experiments showing structured triplets outperform raw reports and prior normalization baselines on multiple downstream tasks (e.g., disease classification, report generation).
Modular augmentation hooks – demonstrates how the triplet format enables plug‑in text‑level augmentations such as knowledge enrichment (adding ontology facts) and anatomy‑grounded counterfactuals, boosting robustness without changing the core normalizer.
Cross‑modality applicability – evaluated on both chest X‑ray and abdominal CT corpora, proving the approach generalizes across imaging modalities.

Methodology

Report Parsing – a lightweight NLP front‑end (sentence segmentation + part‑of‑speech tagging) isolates candidate anatomical mentions.
Entity Linking – uses a pre‑trained medical ontology (e.g., RadLex, SNOMED CT) to map each mention to a canonical anatomical entity (e.g., “right lower lobe”).
Description Extraction – a rule‑based + transformer‑based classifier extracts the radiologic description (e.g., “consolidation”, “ground‑glass opacity”) that directly pertains to the linked anatomy.
Diagnosis Categorization – a fine‑tuned BERT model predicts a high‑level diagnosis label (e.g., “pneumonia”, “fracture”) from the remaining report context.
Triplet Assembly – the three components are concatenated into the final normalized string:
```
[Right Lower Lobe: Consolidation + Pneumonia]
```
Integration with Vision‑Language Pre‑training – the triplets replace raw reports as textual inputs for contrastive or generative pre‑training objectives (e.g., CLIP‑style image‑text alignment).

The pipeline is deliberately modular: each stage can be swapped for a more sophisticated model, but the default configuration works out‑of‑the‑box for most research and production settings.

Results & Findings

Dataset (Modality)	Baseline (raw reports)	Prior Normalization	MedTri Triplet	Δ over Baseline
ChestX‑Ray14	71.2 % AUC (disease cls)	73.0 %	75.6 %	+4.4 %
MIMIC‑CT	68.5 % AUC (lesion det.)	70.1 %	73.3 %	+4.8 %
Report Generation (BLEU)	12.4	13.7	15.9	+3.5

Consistent gains across both classification and report‑generation tasks, confirming that anatomy‑grounded normalization supplies higher‑quality supervision.
Ablation studies show that removing either the anatomical entity or the diagnosis category degrades performance, highlighting the importance of the full triplet.
Augmentation experiments (knowledge enrichment + counterfactual anatomy swaps) add an extra 1–2 % improvement on top of the MedTri baseline, demonstrating the extensibility of the format.

Practical Implications

Faster model convergence – cleaner, uniform text reduces the noise the vision‑language model must learn to ignore, cutting pre‑training epochs and compute costs.
Better downstream transfer – models pre‑trained with MedTri triplets adapt more readily to specialty tasks (e.g., rare disease detection) because the textual signal is tightly tied to anatomical regions.
Plug‑and‑play for developers – the open‑source MedTri library can be integrated into existing pipelines (PyTorch, TensorFlow) with a single function call, no need to hand‑craft regexes or custom ontologies.
Facilitates compliance & auditing – structured triplets are easier to map to regulatory vocabularies, aiding traceability and explainability in clinical AI products.
Enables advanced data augmentation – developers can programmatically generate counterfactual reports (e.g., “Left lung: Clear + No pneumonia”) to stress‑test models for robustness against label noise or bias.

Limitations & Future Work

Ontology dependence – the current entity linker relies on a fixed set of anatomical terms; extending to less‑common anatomies or emerging modalities may require additional curation.
Rule‑heavy description extraction – while effective, the rule‑based component can miss nuanced phrasing; future work could replace it with end‑to‑end neural parsers trained on larger annotated corpora.
Scalability to multi‑modal reports – the study focused on single‑image reports; handling multi‑image series (e.g., full CT scans) will need richer spatial linking.
Clinical validation – the paper reports benchmark improvements, but real‑world deployment studies (e.g., radiologist workflow integration) are still pending.

The authors plan to broaden MedTri’s ontology coverage, explore hierarchical triplet structures (organ → sub‑structure), and open a benchmark hub for community‑driven evaluation.

Authors

Yuetan Chu
Xinhua Ma
Xinran Jin
Gongning Luo
Xin Gao

Paper Information

arXiv ID: 2602.22143v1
Categories: cs.CV
Published: February 25, 2026
PDF: Download PDF

[Paper] MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] MediX-R1: Open Ended Medical Reinforcement Learning

[Paper] VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

[Paper] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

[Paper] A Dataset is Worth 1 MB