[Paper] QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions

Published: 1 day ago (March 2, 2026 at 05:18 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.01690v1

Overview

The paper introduces QIME, a new way to turn medical text (e.g., clinical notes, research abstracts) into embeddings that are both high‑performing and human‑readable. Instead of opaque dense vectors, QIME represents each document as a series of yes/no answers to clinically grounded questions derived from biomedical ontologies. This makes the embeddings interpretable for clinicians and developers while still rivaling the accuracy of black‑box models.

Key Contributions

Ontology‑grounded question generation: Leverages medical concept signatures (e.g., SNOMED‑CT, MeSH) to automatically craft fine‑grained, clinically meaningful binary questions.
Training‑free embedding construction: Bypasses the need to train a separate classifier for every question; answers are obtained directly from the language model’s masked‑language‑model (MLM) probabilities.
Interpretability without sacrificing performance: Achieves state‑of‑the‑art results on biomedical similarity, clustering, and retrieval tasks, closing the gap to dense black‑box encoders.
Scalable and modular design: New medical domains or ontologies can be plugged in with minimal engineering effort.

Methodology

Concept Signature Extraction
- For each medical ontology cluster (e.g., “cardiovascular diseases”), the authors collect a set of representative terms (signatures) using TF‑IDF and ontology hierarchy information.
Question Generation
- A template (“Does the text mention X?”) is instantiated with each signature term, producing a pool of candidate yes/no questions.
- A lightweight scoring model ranks questions by semantic atomicity (how specific and non‑redundant they are) and clinical relevance.
Answering via MLM
- Instead of training a binary classifier per question, QIME feeds the question and the target text into a pretrained biomedical language model (e.g., BioBERT).
- The model’s masked token probability for “yes” vs. “no” yields the binary answer, forming one dimension of the final embedding.
Embedding Assembly
- The concatenated vector of binary answers (e.g., 256‑dim) constitutes the interpretable embedding.
- Because each dimension maps to a concrete question, developers can read out why two documents are similar (e.g., both answer “yes” to “Is hypertension mentioned?”).

Results & Findings

Benchmark	QIME vs. Prior Interpretable Methods	QIME vs. Black‑Box Encoders
Biomedical Semantic Similarity (BIOSSES)	+12.4% Spearman improvement	-3.1% relative to BioBERT (gap narrowed)
Clustering (k‑means on PubMed abstracts)	Adjusted Rand Index +0.18	Within 5% of dense embeddings
Retrieval (MeSH‑based query)	Recall@10 +9.7%	Within 2% of SOTA dense retrievers

Key takeaways

The training‑free variant even outperforms the classifier‑based version, showing that MLM‑based answering is a strong signal.
Qualitative analysis reveals that the top‑ranked questions often correspond to clinically decisive concepts (e.g., “Is the patient on anticoagulants?”).

Practical Implications

Explainable AI for Clinical Decision Support: Developers can embed patient notes and instantly surface the exact clinical facts driving similarity scores, aiding trust and auditability.
Rapid Prototyping of Retrieval Systems: Since no per‑question training is required, teams can spin up a searchable knowledge base by simply defining the ontology of interest.
Regulatory Compliance: Interpretable embeddings satisfy emerging “right‑to‑explain” requirements in healthcare AI deployments.
Domain Adaptation: Adding a new specialty (e.g., oncology) only needs the corresponding ontology; QIME automatically generates relevant questions without retraining the whole model.

Limitations & Future Work

Ontology Dependence: The quality of embeddings hinges on the completeness and granularity of the underlying medical ontology; rare or emerging concepts may be missed.
Binary Question Scope: Complex relations (e.g., temporal or causal) are reduced to yes/no, potentially oversimplifying nuanced clinical narratives.
Scalability of Question Set: Very large signature pools can inflate embedding dimensionality; future work could explore adaptive pruning or hierarchical question encoding.
Evaluation on Real‑World Clinical Workflows: The paper focuses on benchmark datasets; deploying QIME in live EHR systems and measuring impact on clinician workflow remains an open step.

Authors

Yixuan Tang
Zhenghong Lin
Yandong Sun
Anthony K. H. Tung

Paper Information

arXiv ID: 2603.01690v1
Categories: cs.CL, cs.AI
Published: March 2, 2026
PDF: Download PDF

[Paper] QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Tool Verification for Test-Time Reinforcement Learning

[Paper] Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment

[Paper] Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

[Paper] LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations